Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hacluster integration #308

Merged
merged 3 commits into from
Nov 29, 2023
Merged

Add hacluster integration #308

merged 3 commits into from
Nov 29, 2023

Conversation

Cynerva
Copy link
Contributor

@Cynerva Cynerva commented Oct 31, 2023

Add hacluster integration to the ops version of kubernetes-control-plane. WIP.

This adds the ha relation endpoint and two config options: ha-cluster-vip and ha-cluster-dns. The implementation in this PR is able to register VIPs or DNS records with hacluster. If used, then those VIPs/hostnames will be used for Kubernetes API endpoints in kubeconfigs used by kubelet, kube-proxy, and end-users.

However, this work so far is missing any sort of failover mechanism. If kube-apiserver goes down on the unit that is holding the VIP, it will continue to hold that VIP. In the reactive charm, failover was handled by the charm during update-status hooks, where it would check the status of control-plane systemd services and update Pacemaker node status accordingly. See here. This is obviously not ideal since it can mean failovers take up to 5 minutes with default Juju configuration, and could stop occurring entirely if the charm is in a bad state.

Prior to that, we used to register the systemd services to hacluster. However, that was removed because of a long history of bugs involving Pacemaker taking control of the systemd services and failing to run them when it should. See here.

At this point, I see three potential ways to resolve this:

  1. Handle failover during the update-status hook the same way the reactive charm does
  2. Investigate hacluster/pacemaker configuration to see if there is a way to have it monitor the Kubernetes API without taking control of the systemd service
  3. Deprecate hacluster support in kubernetes-control-plane entirely, and instead require kubeapi-load-balancer + hacluster to used for any solutions requiring HA

I suspect it may also be worth talking to the OpenStack team to figure out what the trajectory of the hacluster charm is. Will it be getting an ops uplift?

@Cynerva Cynerva force-pushed the gkk/hacluster branch 2 times, most recently from 8d471e3 to 8a6f3a2 Compare November 17, 2023 19:27
@Cynerva Cynerva marked this pull request as ready for review November 17, 2023 21:08
@Cynerva
Copy link
Contributor Author

Cynerva commented Nov 17, 2023

Fixes https://bugs.launchpad.net/bugs/2043695

I added basic VIP failover during the update-status hook, similar to how the reactive charm does it. This should be ready to go.

requirements.txt Outdated Show resolved Hide resolved
src/charm.py Outdated Show resolved Hide resolved
George Kraft and others added 2 commits November 21, 2023 11:56
Copy link
Member

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@addyess addyess merged commit d167ccf into ops Nov 29, 2023
7 checks passed
@addyess addyess deleted the gkk/hacluster branch November 29, 2023 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants