Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepalived issue with haproxy #2485

Open
adnanhamdussalam opened this issue Oct 22, 2024 · 15 comments
Open

keepalived issue with haproxy #2485

adnanhamdussalam opened this issue Oct 22, 2024 · 15 comments

Comments

@adnanhamdussalam
Copy link

Hi,

Servers: test1, test2

I have already configured keepalived against two ha proxy servers I am able to move the VIP to other server when haproxy services goes down on one server (test1). Now when the haproxy service is running on server test2 and VIP is also on server test2 and then I start the service of keepalived on test1 but its priority is low and then when I shutdown the haproxy service on server test2 due to low priority on server test1 the keepalived does not move back the VIP to server test1.

Any idea or possibility to do it?

@pqarmitage
Copy link
Collaborator

You will need to provide copies of your keepalived configurations, and also any track_scripts that you are using. Then we can have a look at it.

@adnanhamdussalam
Copy link
Author

PFB the configuration settings:

server 1:

[root@testbed06 postgres]# cat /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}

vrrp_script chk_haproxy_down {
script "/etc/keepalived/chk_haproxy_advanced.sh"
interval 2
weight -20 # Apply this weight if HAProxy is down on this node
fall 2
rise 2
# If exit code is 1 (this node's HAProxy is down)
}

vrrp_script chk_both_haproxy_down {
script "/etc/keepalived/chk_haproxy_advanced.sh"
interval 2
weight -50 # Apply this weight if both nodes' HAProxy services are down
fall 2
rise 2
# If exit code is 2 (both nodes' HAProxy are down)
}

vrrp_instance VI_1 {
state MASTER # Set this node as MASTER
interface enp1s0 # Network interface to monitor
virtual_router_id 51 # VRRP ID (must be the same on both nodes)
priority 101 # Priority (higher number means higher priority)
advert_int 1 # Advertisement interval (seconds)
authentication {
auth_type PASS
auth_pass 1234 # Authentication password (must match on both nodes)
}
virtual_ipaddress {
10.114.16.72 # Virtual IP address (VIP)
}
track_script {
chk_haproxy_down
chk_both_haproxy_down
}

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

}
[root@testbed06 postgres]# cat "/etc/keepalived/chk_haproxy_advanced.sh"
#!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then
# If HAProxy is running on this node, ensure full priority
echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100 # Example: Set priority back to full
exit 0
else
# HAProxy is down on this node, check the other node's HAProxy status
ssh [email protected] "killall -0 haproxy >/dev/null 2>&1"
sleep 10
if [ $? -ne 0 ]; then
# Both HAProxy services are down, reduce priority drastically

    echo "Both HAProxy services are down. Reducing priority drastically."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50  # Example: Reduce priority significantly
    exit 2
else
    # Only this node's HAProxy is down, reduce priority moderately
    echo "HAProxy is down on this node. Reducing priority moderately."
    ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80  # Example: Reduce priority moderately
    exit 1
fi

fi

server 2:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}

vrrp_script chk_haproxy_down {
script "/etc/keepalived/chk_haproxy_advanced.sh"
interval 2
weight -20 # Apply this weight if HAProxy is down on this node
fall 2
rise 2
# If exit code is 1 (this node's HAProxy is down)
}

vrrp_script chk_both_haproxy_down {
script "/etc/keepalived/chk_haproxy_advanced.sh"
interval 2
weight -50 # Apply this weight if both nodes' HAProxy services are down
fall 2
rise 2
# If exit code is 2 (both nodes' HAProxy are down)
}

#vrrp_script chk_master_haproxy {

script "ssh [email protected] 'killall -0 haproxy' || echo 1"

interval 5

weight 10

#}

vrrp_instance VI_1 {
state BACKUP # Set this node as BACKUP
interface enp1s0 # Network interface to monitor
virtual_router_id 51 # VRRP ID (must match the MASTER node)
priority 100 # Priority (lower than MASTER)
advert_int 1 # Advertisement interval (seconds)
authentication {
auth_type PASS
auth_pass 1234 # Authentication password (must match the MASTER)
}
virtual_ipaddress {
10.114.16.72 # Same VIP as the MASTER node
}
track_script {
chk_haproxy_down

chk_both_haproxy_down

}

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt
}
[postgres@testbed09-1664 ~]$ cat /etc/keepalived/chk_haproxy_advanced.sh
#!/bin/bash

Define the path to Keepalived's control socket or state file (if applicable)

KEEPALIVED_VRRP_INSTANCE="VI_1"

Local HAProxy status check

if killall -0 haproxy >/dev/null 2>&1; then
# If HAProxy is running on this node, ensure full priority
echo "HAProxy is running on this node."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 100 # Example: Set priority back to full
exit 0
else
# HAProxy is down on this node, check the other node's HAProxy status
ssh [email protected] "killall -0 haproxy >/dev/null 2>&1"
sleep 10
if [ $? -ne 0 ]; then
# Both HAProxy services are down, reduce priority drastically
echo "Both HAProxy services are down. Reducing priority drastically."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 50 # Example: Reduce priority significantly
exit 2
else
# Only this node's HAProxy is down, reduce priority moderately
echo "HAProxy is down on this node. Reducing priority moderately."
ip vrf exec $KEEPALIVED_VRRP_INSTANCE 80 # Example: Reduce priority moderately
exit 1
fi
fi

@pqarmitage
Copy link
Collaborator

There appear to be a number of issues:

  1. You have added sleep 10 after the ssh postgres@ ... command in the chk_haproxy_advanced.sh script. The exit code of sleep will be 0, and so the else block will always be executed, and the result of ssh postgres@... will always be ignored.
  2. I don't know what ip vrf VI_1 100 (or 80 or 50) are expected to do, unless you have commands 100 80 and 50. Have you created a vrf named VI_1?
  3. vrrp_scripts chk_haproxy_down and chk_both_haproxy_down both call shell script /etc/keepalived/chk_haproxy_advanced.sh, and so the vrrp_scripts will either both be up or both be down.
  4. /etc/keepalived/chk_haproxy_advanced.sh can exit with exit codes of 0, 1 or 2. keepalived just checks whether the exit code is 0 or not 0, so keepalived will not be aware of any difference between an exit code of 1 or 2, although currently your script will never exit with exit code 2 (see point 1. above).

I don't know why keepalived is not taking over as master on test1 when keepalived is stopped on test2, but I suggest you correct the issues identified above first, and if you are still experiencing the original problem you will need to post the full keepalived logs from both systems. Also, if you execute kill -USR1 $(cat /var/run/keepalived.pid) when test1 has not taken over as master keepalived will produce a file /tmp/keepalived.data, and it would be helpful if you posted that as well.

@adnanhamdussalam
Copy link
Author

I have changed the setting now :

Master:

[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}

vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2 # Check every 2 seconds
weight -10 # Reduce priority by 10 if the script fails
}

vrrp_instance VI_1 {
state MASTER # Set this node as MASTER
interface enp1s0 # Network interface to monitor
virtual_router_id 51 # VRRP ID (must be the same on both nodes)
priority 101 # Priority (higher number means higher priority)
advert_int 1 # Advertisement interval (seconds)
authentication {
auth_type PASS
auth_pass 1234 # Authentication password (must match on both nodes)
}
virtual_ipaddress {
10.114.16.72 # Virtual IP address (VIP)
}
track_script {
chk_haproxy
}

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
preempt

}

BAckup:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}

vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2 # Check every 2 seconds
weight -2 # Reduce priority by 10 if the script fails
}

vrrp_instance VI_1 {
state BACKUP # Set this node as BACKUP
interface enp1s0 # Network interface to monitor
virtual_router_id 51 # VRRP ID (must match the MASTER node)
priority 100 # Priority (lower than MASTER)
advert_int 1 # Advertisement interval (seconds)
authentication {
auth_type PASS
auth_pass 1234 # Authentication password (must match the MASTER)
}
virtual_ipaddress {
10.114.16.72 # Same VIP as the MASTER node
}
track_script {
chk_haproxy
}

notify_master /etc/keepalived/start_haproxy.sh
notify_backup /etc/keepalived/stop_haproxy.sh
}

when I shutdown the haproxy on testbed09 the service do not get to testbed06 because priority is 91 on it
How can I control this issue ?

PFb the logs of both and I unable to find the /tmp/keepalived

master log output :

[postgres@testbed09-1664 ~]$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled)
Active: active (running) since Tue 2024-10-29 12:31:42 EDT; 2min 3s ago
Main PID: 2761615 (keepalived)
Tasks: 2 (limit: 201936)
Memory: 1.9M
CPU: 1.006s
CGroup: /system.slice/keepalived.service
├─2761615 /usr/sbin/keepalived --dont-fork -D
└─2761616 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:32:32 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 98 to 100
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:35 testbed09-1664 Keepalived_vrrp[2761616]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: Script chk_haproxy now returning 1
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Oct 29 12:33:16 testbed09-1664 Keepalived_vrrp[2761616]: (VI_1) Changing effective priority from 100 to 98

backup log :

[postgres@testbed06 ~]$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled)
Active: active (running) since Mon 2024-10-28 06:37:58 EDT; 1 day 5h ago
Main PID: 352793 (keepalived)
Tasks: 2 (limit: 98870)
Memory: 2.0M
CPU: 8min 26.411s
CGroup: /system.slice/keepalived.service
├─352793 /usr/sbin/keepalived --dont-fork -D
└─352794 /usr/sbin/keepalived --dont-fork -D

Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:30:10 testbed06 Keepalived_vrrp[352794]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: Script chk_haproxy now returning 1
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Oct 29 12:32:27 testbed06 Keepalived_vrrp[352794]: (VI_1) Changing effective priority from 101 to 91
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Master received advert from 10.114.16.64 with higher priority 98, ours 91
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) Entering BACKUP STATE
Oct 29 12:32:30 testbed06 Keepalived_vrrp[352794]: (VI_1) removing VIPs.

@pqarmitage
Copy link
Collaborator

The file produced by kill -USR1 ... is /tmp/keepalived.data, not /tmp/keepalived.

I think your problem is that when a VRRP instance is in the backup state, you stop ha_proxy, and only start it again once the VRRP instance transitions to master state.

So when keepalived is running on both testbed06 and testbed09, the VRRP instances start in backup mode (and haproxy is not running), so testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2). testbed06 has higher priority and so becomes VRRP master, haproxy is started, and the VRRP instance priority increases to 101.

You then stop haproxy on testbed06, and so the VRRP priority reduces to 99, but this is still higher than testbed09, and so testbed06 remains the VRRP master.

I think you need to not have keepalived starting and stopping haproxy, and then it should work.

@adnanhamdussalam
Copy link
Author

Thank you for the update I have removed the starting and stopping haproxy in keepalived but still having the same issue because when priority get low on backup keepalived as it is testbed06 has priority 99 (101 - 2) and testbed09 has priority 98 (100 - 2) after stopping the service on testbed06 the priority becomes 99 and on testbed09 has already priority 98 so the VIP do not switch.

I think failover (HA) for haproxy is not possible with keepalived as per my best knowledge and after performing many tests.

Kindly can you share your expert opinion on the above test case.

@pqarmitage
Copy link
Collaborator

Based on the configuration you have provided above, your statement that testbed06 has priority 99 and testbed09 has priority 98 means that haproxy is not running on either system. You need to ensure that haproxy is permanently running on both systems, i.e. enable the haproxy service using systemctl.

There appear to be quite a few websites that describe how to use keepalived with haproxy, such as:
https://medium.com/@kemalozz/installation-of-haproxy-and-keepalived-for-high-availability-f1d6e7b8982a
https://sysadmins.co.za/achieving-high-availability-with-haproxy-and-keepalived-building-a-redundant-load-balancer/
https://www.digitalocean.com/community/tutorials/how-to-set-up-highly-available-haproxy-servers-with-keepalived-and-reserved-ips-on-ubuntu-14-04
https://docs.vmware.com/en/vRealize-Operations/8.10/vrops-manager-load-balancing/GUID-EC001888-776B-42D5-9843-719EF08AB940.html
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/load_balancer_administration/s2-lvs-keepalived-haproxy-vsa

and these should give you some guidance on what you need to do.

@adnanhamdussalam
Copy link
Author

Thank you for the update.
It is not possible to running haproxy on both servers because the haproxy is using VIP and haproxy service will up from the server where the VIP resides.
When I try to start the service of haproxy on the server where VIP does not reside the haproxy service simple error out and exited error the service.

@pqarmitage
Copy link
Collaborator

In the sample configurations I have seen don't specify an IP address to bind to, e.g.:

frontend my_frontend
  bind *:80
  default_backend my_backend

Doing it that way, you should not have a problem with the VIP not being present.

@adnanhamdussalam
Copy link
Author

Thank you for the update.
After changing the bind to * now rather VIP is available or not psql using IP connect to postgresql.

I still think HA is not possible for haproxy using keepalived.

Any idea ?

PFB the output :

Backup testbed09:

Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:55:49 testbed09-1664 Keepalived_vrrp[3136805]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Master received advert from 10.114.16.50 with higher priority 102, ours 101
Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) Entering BACKUP STATE
Oct 31 06:57:16 testbed09-1664 Keepalived_vrrp[3136805]: (VI_1) removing VIPs.

[postgres@testbed09-1664 keepalived]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb
psql (16.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

mydb=#

Master testbed06:

Oct 31 06:57:16 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Oct 31 06:57:21 testbed06 Keepalived_vrrp[34771]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
[postgres@testbed06 haproxy]$ psql -h 10.114.16.72 -p 5000 -U postgres -d mydb
psql (16.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

mydb=#

@pqarmitage
Copy link
Collaborator

I think you probably need to post your haproxy configuration in order for us to be able to comment any further. I can then try and reproduce the problem.

It is worth trying to access postgresql from a third machine, and not testbed06 or testbed09, in the first instance. There can be complications when trying to forward connections when they originate on the same system as is doing the forwarding, although I don't know if that applies to haproxy. If you test it this way, then you can use tcpdump or wireshark to see what is happening to the packets and identify where the problem lies. You might find it works better if you use

Given the number of artlcles on the web that describe how to use haproxy and keepalived together, I think it is very unlikely that HA is not possible for haproxy using keepalived.

@pqarmitage
Copy link
Collaborator

I've just seen that I didn't finish the second paragraph. I intended to suggest that you add use_vmac to the vrrp_instance block. That would have the advantage that the MAC address associated with the VIP does not change when the backup takes over as master. However, it does mean that the backup instance would not be able to communicate with the VIP on the other system, since the advertised MAC address for the VIP would be locally configured on the backup.

@adnanhamdussalam
Copy link
Author

adnanhamdussalam commented Nov 1, 2024

Thank you for the update.

I will try it my third system and will share the results in a while.
PFB my current settings:
MASTER:

[postgres@testbed06 ~]$ cat /etc/keepalived/keepalived.conf
global_defs {

script_user root
enable_script_security

}
vrrp_script chk_haproxy {
script "killall -0 haproxy" # widely used idiom
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface enp1s0
state MASTER
priority 100
virtual_router_id 51
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
10.114.16.72/24
}
unicast_src_ip 10.114.16.50 # This node
unicast_peer {
10.114.16.64 # Other nodes
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/start_haproxy.sh
}

BACKUP:

[postgres@testbed09-1664 ~]$ cat /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "killall -0 haproxy" # widely used idiom
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface enp1s0
state BACKUP
priority 99
virtual_router_id 51
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
10.114.16.72/24
}
unicast_src_ip 10.114.16.64 # This node
unicast_peer {
10.114.16.50 # Other nodes
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/start_haproxy.sh
}
[postgres@testbed09-1664 ~]$

@adnanhamdussalam
Copy link
Author

I have tried from the third system it accessible but still facing the priority issue i have stopped the haproxy service on master but it did not switched the VIP PFB the details :

Master output :

[postgres@testbed06 ~]$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled)
Active: active (running) since Fri 2024-11-01 06:58:00 EDT; 8min ago
Main PID: 178716 (keepalived)
Tasks: 2 (limit: 98870)
Memory: 1.9M
CPU: 2.373s
CGroup: /system.slice/keepalived.service
├─178716 /usr/sbin/keepalived --dont-fork -D
└─178717 /usr/sbin/keepalived --dont-fork -D

Nov 01 06:58:04 testbed06 Keepalived_vrrp[178717]: (VI_1) Changing effective priority from 100 to 102
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:58:09 testbed06 Keepalived_vrrp[178717]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: Script chk_haproxy now returning 1
Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Nov 01 07:00:12 testbed06 Keepalived_vrrp[178717]: (VI_1) Changing effective priority from 102 to 100

BACKUP Output:

[postgres@testbed09-1664 ~]$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; preset: disabled)
Active: active (running) since Fri 2024-11-01 06:43:39 EDT; 23min ago
Main PID: 3396580 (keepalived)
Tasks: 2 (limit: 201936)
Memory: 1.9M
CPU: 254ms
CGroup: /system.slice/keepalived.service
├─3396580 /usr/sbin/keepalived --dont-fork -D
└─3396582 /usr/sbin/keepalived --dont-fork -D

Nov 01 06:57:18 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Sending/queueing gratuitous ARPs on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:57:23 testbed09-1664 Keepalived_vrrp[3396582]: Sending gratuitous ARP on enp1s0 for 10.114.16.72
Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Master received advert from 10.114.16.50 with higher priority 100, ours 99
Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) Entering BACKUP STATE
Nov 01 06:58:04 testbed09-1664 Keepalived_vrrp[3396582]: (VI_1) removing VIPs.
[postgres@testbed09-1664 ~]$ systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; preset: disabled)
Active: active (running) since Fri 2024-11-01 06:40:12 EDT; 27min ago
Process: 3396049 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 3396052 (haproxy)
Status: "Ready."
Tasks: 17 (limit: 201936)
Memory: 20.1M
CPU: 121ms
CGroup: /system.slice/haproxy.service
├─3396052 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
└─3396054 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Nov 01 06:40:12 testbed09-1664 systemd[1]: Starting HAProxy Load Balancer...
Nov 01 06:40:12 testbed09-1664 haproxy[3396052]: [NOTICE] (3396052) : New worker (3396054) forked
Nov 01 06:40:12 testbed09-1664 haproxy[3396052]: [NOTICE] (3396052) : Loading success.
Nov 01 06:40:12 testbed09-1664 systemd[1]: Started HAProxy Load Balancer.

@pqarmitage
Copy link
Collaborator

What the logs show is that at 06:58:04 testbed09 had priority 99 and at that time testbed06 had priority 100. At the same time, testbed06 increased its priority from 100 to 102 (so presumably the track_script detected that haproxy had started). At 07:00:12 chk_haproxy started returning 1 on testbed06, and so it reduced its priority to 100. This was still higher than the priority on testbed09 (99), and so testbed06 remained as master.

It would appear that on testbed09 for some reason the track_script chk_haproxy is not correctly seeing that haproxy is running and is therefore returning a non-zero exit code, causing the priority to remain at 99. I suggest you post the full keepalived logs on both systems from the time that keepalived started up so that we can see what is actually happening. Just seeing the last few lines of the log entries from the output of systemctl status keepalived is really not sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants