From 76b3db0265f654f966ddb84aae7999eb2aff7c29 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 25 Sep 2024 12:48:36 +0200 Subject: [PATCH] Refactor InfraMachine contract --- .../src/developer/core/controllers/cluster.md | 4 +- .../src/developer/core/controllers/machine.md | 152 +---- .../providers/contracts/infra-cluster.md | 6 +- .../providers/contracts/infra-machine.md | 633 ++++++++++++++---- .../providers/getting-started/webhooks.md | 24 +- 5 files changed, 525 insertions(+), 294 deletions(-) diff --git a/docs/book/src/developer/core/controllers/cluster.md b/docs/book/src/developer/core/controllers/cluster.md index 33e5ad388ce9..ac648fd2f10e 100644 --- a/docs/book/src/developer/core/controllers/cluster.md +++ b/docs/book/src/developer/core/controllers/cluster.md @@ -5,7 +5,7 @@ The Cluster controller is responsible for reconciling the Cluster resource. In order to allow Cluster provisioning on different type of infrastructure, The Cluster resource references an InfraCluster object, e.g. AWSCluster, GCPCluster etc. -The [InfraCluster resource contract](../../providers/contracts/infra-cluster.md) defines a set of rules a provider is expected to comply in order to allow +The [InfraCluster resource contract](../../providers/contracts/infra-cluster.md) defines a set of rules a provider is expected to comply with in order to allow the expected interactions with the Cluster controller. Among those rules: @@ -18,7 +18,7 @@ Among those rules: Similarly, in order to support different solutions for control plane management, The Cluster resource references an ControlPlane object, e.g. KubeadmControlPlane, EKSControlPlane etc. -The [ControlPlane resource contract](../../providers/contracts/control-plane.md) defines a set of rules a provider is expected to comply in order to allow +The [ControlPlane resource contract](../../providers/contracts/control-plane.md) defines a set of rules a provider is expected to comply with in order to allow the expected interactions with the Cluster controller. Considering all the info above, the Cluster controller's main responsibilities are: diff --git a/docs/book/src/developer/core/controllers/machine.md b/docs/book/src/developer/core/controllers/machine.md index bd318a381593..861abdfca2a5 100644 --- a/docs/book/src/developer/core/controllers/machine.md +++ b/docs/book/src/developer/core/controllers/machine.md @@ -1,20 +1,39 @@ -# Machine Controller +# Machine Controller -![](../../../images/cluster-admission-machine-controller.png) +The Machine controller is responsible for reconciling the Machine resource. + +In order to allow Machine provisioning on different type of infrastructure, The Machine resource references +an InfraMachine object, e.g. AWSMachine, GCMachine etc. + +The [InfraMachine resource contract](../../providers/contracts/infra-machine.md) defines a set of rules a provider is expected to comply with in order to allow +the expected interactions with the Machine controller. + +Among those rules: +- InfraMachine MUST report a [provider ID](../../providers/contracts/infra-machine.md#inframachine-provider-id) for the Machine +- InfraMachine SHOULD take into account the [failure domain](../../providers/contracts/infra-machine.md#inframachine-failure-domain) where machines should be placed in +- InfraMachine SHOULD surface machine's [addresses](../../providers/contracts/infra-machine.md#inframachine-addresses) to help operators when troubleshooting issues +- InfraMachine MUST report when Machine's infrastructure is [fully provisioned](../../providers/contracts/infra-machine.md#inframachine-initialization-completed) +- InfraMachine SHOULD report [conditions](../../providers/contracts/infra-machine.md#inframachine-conditions) +- InfraMachine SHOULD report [terminal failures](../../providers/contracts/infra-machine.md#inframachine-terminal-failures) + +Similarly, in order to support different machine bootstrappers, The Machine resource references +a BootstrapConfig object, e.g. KubeadmBoostrapConfig etc. + +The [BootstrapConfig resource contract](../../providers/contracts/bootstrap-config.md) defines a set of rules a provider is expected to comply with in order to allow +the expected interactions with the Machine controller. -The Machine controller's main responsibilities are: +Considering all the info above, the Machine controller's main responsibilities are: -* Setting an OwnerReference on: - * Each Machine object to the Cluster object. - * The associated BootstrapConfig object. - * The associated InfrastructureMachine object. -* Copy data from `BootstrapConfig.Status.DataSecretName` to `Machine.Spec.Bootstrap.DataSecretName` if -`Machine.Spec.Bootstrap.DataSecretName` is empty. -* Setting NodeRefs to be able to associate machines and Kubernetes nodes. -* Deleting Nodes in the target cluster when the associated machine is deleted. -* Cleanup of related objects. -* Keeping the Machine's Status object up to date with the InfrastructureMachine's Status object. -* Finding Kubernetes nodes matching the expected providerID in the workload cluster. +* Setting an OwnerReference on the infrastructure object referenced in `Machine.spec.infrastructureRef`. +* Setting an OwnerReference on the bootstrap object referenced in `Machine.spec.bootstrap.configRef`. +* Keeping the Machine's status in sync with the InfraMachine and BootstrapConfig's status. + * Finding Kubernetes nodes matching the expected providerID in the workload cluster. + * Setting NodeRefs to be able to associate machines and Kubernetes nodes. + * Monitor Kubernetes nodes and propagate labels to them. +* Cleanup of all owned objects so that nothing is dangling after deletion. + * Drain nodes and wait for volumes being detached by CSI plugins. + +![](../../../images/cluster-admission-machine-controller.png) After the machine controller sets the OwnerReferences on the associated objects, it waits for the bootstrap and infrastructure objects referenced by the machine to have the `Status.Ready` field set to `true`. When @@ -25,108 +44,3 @@ The machine controller uses the kubeconfig for the new workload cluster to watch When a node appears with `Node.Spec.ProviderID` matching `Machine.Spec.ProviderID`, the machine controller transitions the associated machine into the `Provisioned` state. When the infrastructure ref is also `Ready`, the machine controller marks the machine as `Running`. - -## Contracts - -### Cluster API - -Cluster associations are made via labels. - -#### Expected labels - -| what | label | value | meaning | -| --- | --- | --- | --- | -| Machine | `cluster.x-k8s.io/cluster-name` | `` | Identify a machine as belonging to a cluster with the name ``| -| Machine | `cluster.x-k8s.io/control-plane` | `true` | Identifies a machine as a control-plane node | - -### Bootstrap provider - -The BootstrapConfig object **must** have a `status` object. - -To override the bootstrap provider, a user (or external system) can directly set the `Machine.Spec.Bootstrap.Data` -field. This will mark the machine as ready for bootstrapping and no bootstrap data will be copied from the -BootstrapConfig object. - -#### Required `status` fields - -The `status` object **must** have several fields defined: - -* `ready` - a boolean field indicating the bootstrap config data is generated and ready for use. -* `dataSecretName` - a string field referencing the name of the secret that stores the generated bootstrap data. - -#### Optional `status` fields - -The `status` object **may** define several fields that do not affect functionality if missing: - -* `failureReason` - a string field explaining why a fatal error has occurred, if possible. -* `failureMessage` - a string field that holds the message contained by the error. - -Note: once any of `failureReason` or `failureMessage` surface on the machine who is referencing the bootstrap config object, -they cannot be restored anymore (it is considered a terminal error; the only way to recover is to delete and recreate the machine). -Also, if the machine is under control of a MachineHealthCheck instance, the machine will be automatically remediated. - -Example: - -```yaml -kind: MyBootstrapProviderConfig -apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 -status: - ready: true - dataSecretName: "MyBootstrapSecret" -``` - -### Infrastructure provider - -The InfrastructureMachine object **must** have both `spec` and `status` objects. - -#### Required `spec` fields - -The `spec` object **must** at least one field defined: - -* `providerID` - a cloud provider ID identifying the machine. - -#### Optional `spec` fields - -The `spec` object **may** define several fields that do not affect functionality if missing: - -* `failureDomain` - is a string identifying the failure domain the instance is running in. - -#### Required `status` fields - -The `status` object **must** at least one field defined: - -* `ready` - a boolean field indicating if the infrastructure is ready to be used or not. - -#### Optional `status` fields - -The `status` object **may** define several fields that do not affect functionality if missing: - -* `failureReason` - is a string that explains why a fatal error has occurred, if possible. -* `failureMessage` - is a string that holds the message contained by the error. -* `addresses` - is a `MachineAddresses` (a list of `MachineAddress`) which represents host names, external IP addresses, internal IP addresses, -external DNS names, and/or internal DNS names for the provider's machine instance. `MachineAddress` is -defined as: - - `type` (string): one of `Hostname`, `ExternalIP`, `InternalIP`, `ExternalDNS`, `InternalDNS` - - `address` (string) - -Note: once any of `failureReason` or `failureMessage` surface on the machine who is referencing the infrastructureMachine object, -they cannot be restored anymore (it is considered a terminal error; the only way to recover is to delete and recreate the machine). -Also, if the machine is under control of a MachineHealthCheck instance, the machine will be automatically remediated. - -Example: -```yaml -kind: MyMachine -apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 -spec: - providerID: cloud:////my-cloud-provider-id -status: - ready: true -``` - -### Secrets - -The Machine controller will create a secret or use an existing secret in the following format: - -| secret name | field name | content | -|:---:|:---:|---| -|`-kubeconfig`|`value`|base64 encoded kubeconfig that is authenticated with the child cluster| diff --git a/docs/book/src/developer/providers/contracts/infra-cluster.md b/docs/book/src/developer/providers/contracts/infra-cluster.md index 0908e1e33e87..51242cd5d88f 100644 --- a/docs/book/src/developer/providers/contracts/infra-cluster.md +++ b/docs/book/src/developer/providers/contracts/infra-cluster.md @@ -119,7 +119,7 @@ rules: - watch ``` -Note: The write permissions allow the Cluster controller to set owner references and labels on the InfraCluster” resources; +Note: The write permissions allow the Cluster controller to set owner references and labels on the InfraCluster resources; write permissions are not used for general mutations of InfraCluster resources, unless specifically required (e.g. when using ClusterClass and managed topologies). @@ -271,7 +271,7 @@ Each InfraCluster MUST report when Cluster's infrastructure is fully provisioned ```go type FooClusterStatus struct { - // Ready denotes that the foo cluster infrastructure fully provisioned. + // Ready denotes that the foo cluster infrastructure is fully provisioned. // +optional Ready bool `json:"ready"` @@ -282,7 +282,7 @@ type FooClusterStatus struct { Once `status.ready` the Cluster "core" controller will bubbles up this info in Cluster's `status.infrastructureReady`; If defined, also InfraCluster's `spec.controlPlaneEndpoint` and `status.failureDomains` will be surfaced on Cluster's -corresponding field at the same time. +corresponding fields at the same time.