Releases: piraeusdatastore/piraeus-ha-controller
Release 1.2.3
This release fixes an issue where some PersistentVolumes have no apiVersion
or kind
set in their spec.claimRef
, which would cause the HA Controller to ignore those PVs. Those are most likely just referencing PVCs. This seems to happen when Volume Populators are involved. This release considers missing apiVersion
and kind
to still refer to PVCs.
Fixed
- Consider PVs with ClaimRef without apiVersion and kind set to also refer to PVCs.
Release 1.2.1
This release fixes a bug that caused many Pods to remain unevicted in the case of node failures.
Fixed
- Fixed detection of Persistent Volume Claims when deciding to fail over Pods.
Release 1.2.0
This releases adds a feature to match DRBD connection names to LINSTOR Satellite Pods. Up to now, this was not necessary, as the connection names always matched the node they where running on. With an upcoming change to the Piraeus Operator, this is no longer the case: now the connection name may be the name of the LINSTOR Satellite Pod.
Added
- Option to assume connection names refer to a Pod instead of a Node.
Release 1.1.4
Changed
- Wait for initial cache sync before starting the server.
- Ignore non-running Pods during fail-over events.
Release 1.1.2
This release contains updates to dependencies, to fix some vulnerabilities
Changed
- Build with go 1.19
- Bumped dependencies
Release 1.1.1
This patch release fixes a common cause of crashes.
Fixed
- No longer attempt to parse numbers from drbdsetup status that are not relevant. This prevents issue when said numbers
are outside the expected range.
Release 1.1.0
The HA Controller got a bit safer, by refusing to delete pods where it can't prove that fail-over is actually safe. This is mostly an issue if using multiple storage providers for volumes of the same pod: Since the HA Controller is dependent on the specific behaviour of DRBD, we can't guarantee that all external storage providers behave the same
Added
- Exempt Pods that are attached to other types of storage by default, unless the volumes are known to be safe (such as
ConfigMap, DownwardAPI, Secret, and other readonly volumes).
Fixed
- Fixed a bug that meant the manual Pod exemption from fail over via annotations would be ignored.
Release 1.0.1
Right after releasing 1.0.0, we noticed some issues that caused the controller pod to crash. This release fixes these issues
Changed
- Fixed an issue with generated events that would lead to the controller to panic because of a nil interface.
- Immediately delete volume attachment if node is not ready.
- Fixed a concurrent map write when failing over multiple resources at once.
Release 1.0.0
Since no issues were reported for the last release candidate, this final release contains no additional changes.
Whats new?
With this release, we changed the way the controller works completely. Previous versions would only communicate with the LINSTOR Controller to watch for resource events. This new version uses the DRBD state on every node directly. This also means that the workload changed from a simple deployment to a daemonset.
To give it a try, use the newly created chart in this repository. On full release, it will move to our main chart directory.
helm install --create-namespace --namespace piraeus-ha-controller piraeus-ha-controller charts/piraeus-ha-controller
The new HA Controller is also better equipped to deal with "suspended" resources. Suspended resources fix a frequently encountered problem where DRBD would report errors to consuming pods during network issues. This would often lead to the filesystem silently remounting as readonly. Using these new recommended storage class settings, DRBD instead freezes the Pod until it is terminated or networking restored:
parameters:
property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
For more information, check out the updated README.
Release 1.0.0-rc.3
This is the third release candidate for the upcoming 1.0.0 release. Please help by testing it!
Here is what changed since the second release candidate:
- Force deletion of Pods if running Node appears not ready (i.e. it cannot confirm deletion of the Pod). This fixes a bug with StatefulSets, where the Pod was only removed after minutes of being stuck in terminating state.
Other than that, nothing has changes since rc.2, so here's the prinicpal changes:
With this release, we changed the way the controller works completely. Previous versions would only communicate with the LINSTOR Controller to watch for resource events. This new version uses the DRBD state on every node directly. This also means that the workload changed from a simple deployment to a daemonset.
To give it a try, use the newly created chart in this repository. On full release, it will move to our main chart directory.
helm install --create-namespace --namespace piraeus-ha-controller piraeus-ha-controller charts/piraeus-ha-controller
The new HA Controller is also better equipped to deal with "suspended" resources. Suspended resources fix a frequently encountered problem where DRBD would report errors to consuming pods during network issues. This would often lead to the filesystem silently remounting as readonly. Using these new recommended storage class settings, DRBD instead freezes the Pod until it is terminated or networking restored:
parameters:
property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
For more information, check out the updated README.