From 496f5c81ba9de80f299bacb96a3c8f8e7cc5eda2 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 30 Sep 2024 14:55:44 +0200 Subject: [PATCH] Refactor the BootstrapConfig contract --- docs/book/src/SUMMARY.md | 3 +- .../developer/core/controllers/bootstrap.md | 16 - .../src/developer/core/controllers/machine.md | 13 + .../providers/contracts/bootstrap-config.md | 533 ++++++++++++++---- .../providers/contracts/infra-cluster.md | 8 +- .../providers/contracts/infra-machine.md | 8 +- .../providers/security-guidelines.md | 4 + ...oller.plantuml => machine-phases.plantuml} | 0 ...trap-controller.png => machine-phases.png} | Bin 9 files changed, 449 insertions(+), 136 deletions(-) delete mode 100644 docs/book/src/developer/core/controllers/bootstrap.md rename docs/book/src/images/{bootstrap-controller.plantuml => machine-phases.plantuml} (100%) rename docs/book/src/images/{bootstrap-controller.png => machine-phases.png} (100%) diff --git a/docs/book/src/SUMMARY.md b/docs/book/src/SUMMARY.md index 95b0f1c32552..6eab4c8285e1 100644 --- a/docs/book/src/SUMMARY.md +++ b/docs/book/src/SUMMARY.md @@ -77,8 +77,7 @@ - [MachineSet](./developer/core/controllers/machine-set.md) - [Machine](./developer/core/controllers/machine.md) - [MachinePool](./developer/core/controllers/machine-pool.md) - - [MachineHealthCheck](./developer/core/controllers/machine-health-check.md) - - [Bootstrap](./developer/core/controllers/bootstrap.md) + - [MachineHealthCheck](./developer/core/controllers/machine-health-check.md) - [Control Plane](./developer/core/controllers/control-plane.md) - [Logging](developer/core/logging.md) - [Testing](developer/core/testing.md) diff --git a/docs/book/src/developer/core/controllers/bootstrap.md b/docs/book/src/developer/core/controllers/bootstrap.md deleted file mode 100644 index 85e05fec2fb3..000000000000 --- a/docs/book/src/developer/core/controllers/bootstrap.md +++ /dev/null @@ -1,16 +0,0 @@ -# Bootstrap Controller - -Bootstrapping is the process in which: - -1. A cluster is bootstrapped -1. A machine is bootstrapped and takes on a role within a cluster - -[CABPK](https://github.com/kubernetes-sigs/cluster-api/tree/main/bootstrap/kubeadm) is the reference bootstrap provider and is based on `kubeadm`. CABPK codifies the steps for [creating a cluster](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) in multiple configurations. - -![](../../../images/bootstrap-controller.png) - -See [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20190610-machine-states-preboot-bootstrapping.md) for the full details on how the bootstrap process works. - -### Implementations - -* [Kubeadm](https://github.com/kubernetes-sigs/cluster-api/tree/main/bootstrap/kubeadm) (Reference Implementation) diff --git a/docs/book/src/developer/core/controllers/machine.md b/docs/book/src/developer/core/controllers/machine.md index 861abdfca2a5..75f474aeed7f 100644 --- a/docs/book/src/developer/core/controllers/machine.md +++ b/docs/book/src/developer/core/controllers/machine.md @@ -22,6 +22,14 @@ a BootstrapConfig object, e.g. KubeadmBoostrapConfig etc. The [BootstrapConfig resource contract](../../providers/contracts/bootstrap-config.md) defines a set of rules a provider is expected to comply with in order to allow the expected interactions with the Machine controller. +Among those rules: +- BootstrapConfig MUST create a [bootstrap data secret](../../providers/contracts/bootstrap-config.md#bootstrapconfig-data-secret) where machines should be placed in +- BootstrapConfig MUST report when Machine's bootstrap data secret is [fully provisioned](../../providers/contracts/bootstrap-config.md#bootstrapconfig-initialization-completed) +- BootstrapConfig SHOULD report [conditions](../../providers/contracts/bootstrap-config.md#bootstrapconfig-conditions) +- BootstrapConfig SHOULD report [terminal failures](../../providers/contracts/bootstrap-config.md#bootstrapconfig-terminal-failures) +- BootstrapConfig SHOULD report [taint Nodes at creation](../../providers/contracts/bootstrap-config.md#taint-nodes-at-creation) +- BootstrapConfig SHOULD create a [sentinel file](../../providers/contracts/bootstrap-config.md#sentinel-file) on machines + Considering all the info above, the Machine controller's main responsibilities are: * Setting an OwnerReference on the infrastructure object referenced in `Machine.spec.infrastructureRef`. @@ -44,3 +52,8 @@ The machine controller uses the kubeconfig for the new workload cluster to watch When a node appears with `Node.Spec.ProviderID` matching `Machine.Spec.ProviderID`, the machine controller transitions the associated machine into the `Provisioned` state. When the infrastructure ref is also `Ready`, the machine controller marks the machine as `Running`. + +The following schema goes trough machine phases and interactions with InfraMachine and BootstrapConfig +happening at each step. + +![](../../../images/machine-phases.png) diff --git a/docs/book/src/developer/providers/contracts/bootstrap-config.md b/docs/book/src/developer/providers/contracts/bootstrap-config.md index baf890fda2cd..cd45511efbb5 100644 --- a/docs/book/src/developer/providers/contracts/bootstrap-config.md +++ b/docs/book/src/developer/providers/contracts/bootstrap-config.md @@ -1,107 +1,438 @@ -# Bootstrap Provider Specification +# Contract rules for BootstrapConfig -## Overview +Bootstrap providers SHOULD implement an BootstrapConfig resource. -A bootstrap provider generates bootstrap data that is used to bootstrap a Kubernetes node. +The goal of an BootstrapConfig resource is to generates bootstrap data that is used to bootstrap a Kubernetes node. +These may be e.g. [cloud-init] scripts. -For example, the Kubeadm bootstrap provider uses a [cloud-init](https://cloudinit.readthedocs.io/en/latest/) file for bootstrapping a node. +The BootstrapConfig resource will be referenced by one of the Cluster API core resources, Machine. -## Data Types +The [Machine's controller](../../core/controllers/machine.md) will be responsible to coordinate operations of the BootstrapConfig, +and the interaction between the Machine's controller and the BootstrapConfig resource is based on the contract +rules defined in this page. -### Bootstrap API resource -A bootstrap provider must define an API type for bootstrap resources. The type: +Once contract rules are satisfied by an BootstrapConfig implementation, other implementation details +could be addressed according to the specific needs (Cluster API is not prescriptive). -1. Must belong to an API group served by the Kubernetes apiserver -2. Must be implemented as a CustomResourceDefinition. - 1. The CRD name must have the format produced by `sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`. -3. Must be namespace-scoped -4. Must have the standard Kubernetes "type metadata" and "object metadata" -5. Should have a `spec` field containing fields relevant to the bootstrap provider -6. Must have a `status` field with the following: - 1. Required fields: - 1. `ready` (boolean): indicates the bootstrap data has been generated and is ready - 1. `dataSecretName` (string): the name of the secret that stores the generated bootstrap data - 2. Optional fields: - 1. `failureReason` (string): indicates there is a fatal problem reconciling the bootstrap data; - meant to be suitable for programmatic interpretation - 2. `failureMessage` (string): indicates there is a fatal problem reconciling the bootstrap data; - meant to be a more descriptive value than `failureReason` +Nevertheless, it is always recommended to take a look at Cluster API controllers, +in-tree providers, other providers and use them as a reference implementation (unless custom solutions are required +in order to address very specific needs). -Note: once any of `failureReason` or `failureMessage` surface on the machine/machine pool who is referencing the bootstrap config object, -they cannot be restored anymore (it is considered a terminal error; the only way to recover is to delete and recreate the machine/machine pool). -Also, if the machine is under control of a MachineHealthCheck instance, the machine will be automatically remediated. +In order to facilitate the initial design for each BootstrapConfig resource, a few [implementation best practices] +are explicitly called out in dedicated pages. -Note: because the `dataSecretName` is part of `status`, this value must be deterministically recreatable from the data in the -`Cluster`, `Machine`, and/or bootstrap resource. If the name is randomly generated, it is not always possible to move -the resource and its associated secret from one management cluster to another. + + +## Rules (contract version v1beta1) + +| Rule | Mandatory | Note | +|----------------------------------------------------------------------------|-----------|--------------------------------------| +| [All resources: scope] | Yes | | +| [All resources: `TypeMeta` and `ObjectMeta`field] | Yes | | +| [All resources: `APIVersion` field value] | Yes | | +| [BootstrapConfig, BootstrapConfigList resource definition] | Yes | | +| [BootstrapConfig: data secret] | Yes | | +| [BootstrapConfig: initialization completed] | Yes | | +| [BootstrapConfig: conditions] | No | | +| [BootstrapConfig: terminal failures] | No | | +| [BootstrapConfigTemplate, BootstrapConfigTemplateList resource definition] | Yes | | +| [BootstrapConfigTemplate: support for SSA dry run] | No | Mandatory for ClusterClasses support | +| [Sentinel file] | No | | +| [Taint Nodes at creation] | No | | +| [Support for running multiple instances] | No | Mandatory for clusterctl CLI support | +| [Clusterctl support] | No | Mandatory for clusterctl CLI support | + +Note: +- `All resources` refers to all the provider's resources "core" Cluster API interacts with; + In the context of this page: `BootstrapConfig`, `BootstrapConfigTemplate` and corresponding list types + +### All resources: scope + +All resources MUST be namespace-scoped. + +### All resources: `TypeMeta` and `ObjectMeta` field + +All resources MUST have the standard Kubernetes `TypeMeta` and `ObjectMeta` fields. + +### All resources: `APIVersion` field value + +In Kubernetes `APIVersion` is a combination of API group and version. +Special consideration MUST applies to both API group and version for all the resources Cluster API interacts with. + +#### All resources: API group + +The domain for Cluster API resources is `cluster.x-k8s.io`, and bootstrap providers under the Kubernetes SIGS org +generally use `bootstrap.cluster.x-k8s.io` as API group. + +If your provider uses a different API group, you MUST grant full read/write RBAC permissions for resources in your API group +to the Cluster API core controllers. The canonical way to do so is via a `ClusterRole` resource with the [aggregation label] +`cluster.x-k8s.io/aggregate-to-manager: "true"`. + +The following is an example ClusterRole for a `FooConfig` resource in the `bootstrap.foo.com` API group: + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: capi-foo-clusters + labels: + cluster.x-k8s.io/aggregate-to-manager: "true" +rules: +- apiGroups: + - bootstrap.foo.com + resources: + - fooconfig + - fooconfigtemplates + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +``` + +Note: The write permissions are required because Cluster API manages BootstrapConfig generated from BootstrapConfigTemplates; +when using ClusterClass and managed topologies, also BootstrapConfigTemplates are managed directly by Cluster API. + +#### All resources: version + +The resource Version defines the stability of the API and its backward compatibility guarantees. +Examples include `v1alpha1`, `v1beta1`, `v1`, etc. and are governed by the [Kubernetes API Deprecation Policy]. + +Your provider SHOULD abide by the same policies. + +Note: The version of your provider does not need to be in sync with the version of core Cluster API resources. +Instead, prefer choosing a version that matches the stability of the provider API and its backward compatibility guarantees. + +Additionally: + +Providers MUST set `cluster.x-k8s.io/` label on the BootstrapConfig Custom Resource Definitions. + +The label is a map from a Cluster API contract version to your Custom Resource Definition versions. +The value is an underscore-delimited (_) list of versions. Each value MUST point to an available version in your CRD Spec. + +The label allows Cluster API controllers to perform automatic conversions for object references, the controllers will pick +the last available version in the list if multiple versions are found. + +To apply the label to CRDs it’s possible to use commonLabels in your `kustomize.yaml` file, usually in `config/crd`: + +```yaml +commonLabels: + cluster.x-k8s.io/v1alpha2: v1alpha1 + cluster.x-k8s.io/v1alpha3: v1alpha2 + cluster.x-k8s.io/v1beta1: v1beta1 +``` + +An example of this is in the [Kubeadm Bootstrap provider](https://github.com/kubernetes-sigs/cluster-api/blob/release-1.1/controlplane/kubeadm/config/crd/kustomization.yaml). + +### BootstrapConfig, BootstrapConfigList resource definition + +You MUST define a BootstrapConfig resource. +The BootstrapConfig resource name must have the format produced by `sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`. + +Note: Cluster API is using such a naming convention to avoid an expensive CRD lookup operation when looking for labels from +the CRD definition of the BootstrapConfig resource. + +It is a generally applied convention to use names in the format `${env}Config`, where ${env} is a, possibly short, name +for the bootstrapper in question. For example `KubeadmConfig` is an implementation for kubeadm. +```go // +kubebuilder:object:root=true -// +kubebuilder:resource:path=phippybootstrapconfigtemplates,scope=Namespaced,categories=cluster-api,shortName=pbct +// +kubebuilder:resource:path=fooconfig,scope=Namespaced,categories=cluster-api // +kubebuilder:storageversion +// +kubebuilder:subresource:status -// PhippyBootstrapConfigTemplate is the Schema for the Phippy Bootstrap API. -type PhippyBootstrapConfigTemplate struct { - metav1.TypeMeta `json:",inline"` +// FooConfig is the Schema for fooconfig. +type FooConfig struct { + metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` + Spec FooConfigSpec `json:"spec,omitempty"` + Status FooConfigStatus `json:"status,omitempty"` +} + +type FooConfigSpec struct { + // See other rules for more details about mandatory/optional fields in BootstrapConfig spec. + // Other fields SHOULD be added based on the needs of your provider. +} + +type FooConfigStatus struct { + // See other rules for more details about mandatory/optional fields in BootstrapConfig status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +For each BootstrapConfig resource, you MUST also add the corresponding list resource. +The list resource MUST be named as `List`. + +```go +// +kubebuilder:object:root=true - Spec PhippyBootstrapConfigTemplateSpec `json:"spec,omitempty"` +// FooConfigList contains a list of fooconfig. +type FooConfigList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty"` + Items []FooConfig `json:"items"` } +``` + +### BootstrapConfig: data secret + +Each BootstrapConfig MUST store generated bootstrap data into a Kubernetes Secret. + +The Secret containing bootstrap data must: + +1. Use the API resource's `status.dataSecretName` for its name +1. Have the label `cluster.x-k8s.io/cluster-name` set to the name of the cluster +1. Have a controller owner reference to the API resource +1. Have a single key, `value`, containing the bootstrap data + +Note: because the `dataSecretName` is part of `status`, this value must be deterministically recreatable from the data in the +`Cluster`, `Machine`, and/or bootstrap resource. If the name is randomly generated, it is not always possible to move +the resource and its associated secret from one management cluster to another. + +When the Secret is created its name MUST surface in the `status.dataSecretName` field of the BootstrapConfig resource; +the Machine controller will surface this info in Machine's `spec.boostrap.dataSecretName` when [BootstrapConfig: initialization completed]. + +### BootstrapConfig: initialization completed + +Each BootstrapConfig MUST report when the the bootstrap data secret is fully provisioned (initialization) by setting +`status.ready` in the BootstrapConfig resource. + +```go +type FooConfigStatus struct { + // Ready denotes that the foo bootstrap data secret is fully provisioned. + // +optional + Ready bool `json:"ready"` + + // See other rules for more details about mandatory/optional fields in BootstrapConfig status. + // Other fields SHOULD be added based on the needs of your provider. +} +``` + +Once `status.ready` the Machine "core" controller will bubbles up this info in Machine's `status.bootstrapConfigReady`; +Also BootstrapConfig's `status.dataSecretName` will be surfaced on Machine's corresponding fields at the same time. + + + +### BootstrapConfig: conditions + +According to [Kubernetes API Conventions], Conditions provide a standard mechanism for higher-level +status reporting from a controller. -type PhippyBootstrapConfigTemplateResource struct { - // Standard object's metadata. - // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata - // +optional - ObjectMeta clusterv1.ObjectMeta `json:"metadata,omitempty"` +Providers implementers SHOULD implement `status.conditions` for their BootstrapConfig resource. +In case conditions are implemented, Cluster API condition type MUST be used. - Spec PhippyBootstrapConfigSpec `json:"spec"` +If a condition with type `Ready` exist, such condition will be mirrored in Machine's `BootstrapConfigReady` condition. + +Please note that the `Ready` condition is expected to surface the status of the BootstrapConfig during its own entire lifecycle, +including initial provisioning, but not limited to that. + +See [Cluster API condition proposal] for more context. + + + +### BootstrapConfig: terminal failures + +Each BootstrapConfig SHOULD report when BootstrapConfig's enter in a state that cannot be recovered (terminal failure) by +setting `status.failureReason` and `status.failureMessage` in the BootstrapConfig resource. + +```go +type FooConfigStatus struct { + // FailureReason will be set in the event that there is a terminal problem reconciling the FooConfig + // and will contain a succinct value suitable for machine interpretation. + // + // This field should not be set for transitive errors that can be fixed automatically or with manual intervention, + // but instead indicate that something is fundamentally wrong with the FooConfig and that it cannot be recovered. + // +optional + FailureReason *capierrors.ClusterStatusError `json:"failureReason,omitempty"` + + // FailureMessage will be set in the event that there is a terminal problem reconciling the FooConfig + // and will contain a more verbose string suitable for logging and human consumption. + // + // This field should not be set for transitive errors that can be fixed automatically or with manual intervention, + // but instead indicate that something is fundamentally wrong with the FooConfig and that it cannot be recovered. + // +optional + FailureMessage *string `json:"failureMessage,omitempty"` + + // See other rules for more details about mandatory/optional fields in BootstrapConfig status. + // Other fields SHOULD be added based on the needs of your provider. } ``` -The CRD name of the template must also have the format produced by `sigs.k8s.io/cluster-api/util/contract.CalculateCRDName(Group, Kind)`. +Once `status.failureReason` and `status.failureMessage` are set on the BootstrapConfig resource, the Machine "core" controller +will surface those info in the corresponding fields in Machine's `status`. + +Please note that once failureReason/failureMessage is set in Machine's `status`, the only way to recover is to delete and +recreate the Machine (it is a terminal failure). + + -### List Resources +### BootstrapConfigTemplate, BootstrapConfigTemplateList resource definition -For any resource, also add list resources, e.g. +For a given BootstrapConfig resource, you MUST also add a corresponding BootstrapConfigTemplate resources in order to use it +when defining set of machines, e.g. MachineDeployments. + +The template resource MUST be named as `Template`. ```go -//+kubebuilder:object:root=true +// +kubebuilder:object:root=true +// +kubebuilder:resource:path=fooconfigtemplates,scope=Namespaced,categories=cluster-api +// +kubebuilder:storageversion -// PhippyBootstrapConfigList contains a list of Phippy Bootstrap Configurations. -type PhippyBootstrapConfigList struct { - metav1.TypeMeta `json:",inline"` - metav1.ListMeta `json:"metadata,omitempty"` - Items []PhippyBootstrapConfig `json:"items"` +// FooConfigTemplate is the Schema for the fooconfigtemplates API. +type FooConfigTemplate struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec FooConfigTemplateSpec `json:"spec,omitempty"` } -//+kubebuilder:object:root=true +type FooConfigTemplateSpec struct { + Template FooConfigTemplateResource `json:"template"` +} -// PhippyBootstrapConfigTemplateList contains a list of PhippyBootstrapConfigTemplate. -type PhippyBootstrapConfigTemplateList struct { - metav1.TypeMeta `json:",inline"` - metav1.ListMeta `json:"metadata,omitempty"` - Items []PhippyBootstrapConfigTemplate `json:"items"` +type FooConfigTemplateResource struct { + // Standard object's metadata. + // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata + // +optional + ObjectMeta clusterv1.ObjectMeta `json:"metadata,omitempty"` + Spec FooConfigSpec `json:"spec"` } ``` +NOTE: in this example BootstrapConfigTemplate's `spec.template.spec` embeds `FooConfigSpec` from BootstrapConfig. This might not always be +the best choice depending of if/how BootstrapConfig's spec fields applies to many machines vs only one. -### Bootstrap Secret +For each BootstrapConfigTemplate resource, you MUST also add the corresponding list resource. +The list resource MUST be named as `List`. -The `Secret` containing bootstrap data must: +```go +// +kubebuilder:object:root=true -1. Use the API resource's `status.dataSecretName` for its name -1. Have the label `cluster.x-k8s.io/cluster-name` set to the name of the cluster -1. Have a controller owner reference to the API resource -1. Have a single key, `value`, containing the bootstrap data +// FooConfigTemplateList contains a list of FooConfigTemplates. +type FooConfigTemplateList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty"` + Items []FooConfigTemplate `json:"items"` +} +``` + +### BootstrapConfigTemplate: support for SSA dry run + +When Cluster API's topology controller is trying to identify differences between templates defined in a ClusterClass and +the current Cluster topology, it is required to run [Server Side Apply] (SSA) dry run call. + +However, in case you immutability checks for your BootstrapConfigTemplate, this can lead the SSA dry run call to errors. + +In order to avoid this BootstrapConfigTemplate MUST specifically implement support for SSA dry run calls from the topology controller. + +The implementation requires to use controller runtime's `CustomValidator`, available in CR versions >= v0.12.3. + +This will allow to skip the immutability check only when the topology controller is dry running while preserving the +validation behavior for all other cases. + +See [the DockerMachineTemplate webhook] as a reference for a compatible implementation. + +### Sentinel file + +A bootstrap provider's bootstrap data must create `/run/cluster-api/bootstrap-success.complete` +(or `C:\run\cluster-api\bootstrap-success.complete` for Windows machines) upon successful bootstrapping of a Kubernetes node. +This allows infrastructure providers to detect and act on bootstrap failures. + +### Taint Nodes at creation + +A bootstrap provider can optionally taint worker nodes at creation with `node.cluster.x-k8s.io/uninitialized:NoSchedule`. +This taint is used to prevent workloads to be scheduled on Nodes before the node is initialized by Cluster API. +As of today the Node initialization consists of syncing labels from Machines to Nodes. Once the labels have been +initially synced the taint is removed from the Node. + +### Support for running multiple instances + +Cluster API does not support running multiples instances of the same provider, which someone can +assume an alternative solution to implement multi tenancy; same applies to the clusterctl CLI. + +See [Support running multiple instances of the same provider] for more context. + +However, if you want to make it possible for users to run multiples instances of your provider, your controller's SHOULD: + +- support the `--namespace` flag. +- support the `--watch-filter` flag. + +Please, read carefully the page linked above to fully understand implications and risks related to this option. + +### Clusterctl support + +The clusterctl command is designed to work with all the providers compliant with the rules defined in the [clusterctl provider contract]. + +## Typical BootstrapConfig reconciliation workflow + +A bootstrap provider must respond to changes to its BootstrapConfig resources. This process is +typically called reconciliation. The provider must watch for new, updated, and deleted resources and respond +accordingly. + +As a reference you can look at the following workflow to understand how the typical reconciliation workflow +is implemented in BootstrapConfig controllers: ## Behavior @@ -125,46 +456,28 @@ The following diagram shows the typical logic for a bootstrap provider: 1. Set `status.ready` to true 1. Patch the resource to persist changes -## Sentinel File - -A bootstrap provider's bootstrap data must create `/run/cluster-api/bootstrap-success.complete` (or `C:\run\cluster-api\bootstrap-success.complete` for Windows machines) upon successful bootstrapping of a Kubernetes node. This allows infrastructure providers to detect and act on bootstrap failures. - -## Taint Nodes at creation - -A bootstrap provider can optionally taint worker nodes at creation with `node.cluster.x-k8s.io/uninitialized:NoSchedule`. -This taint is used to prevent workloads to be scheduled on Nodes before the node is initialized by Cluster API. -As of today the Node initialization consists of syncing labels from Machines to Nodes. Once the labels have been -initially synced the taint is removed from the Node. - -## RBAC - -### Provider controller - -A bootstrap provider must have RBAC permissions for the types it defines, as well as the bootstrap data `Secret` -resources it manages. If you are using `kubebuilder` to generate new API types, these permissions should be configured -for you automatically. For example, the Kubeadm bootstrap provider the following configuration for its `KubeadmConfig` -type: - -``` -// +kubebuilder:rbac:groups=bootstrap.cluster.x-k8s.io,resources=kubeadmconfigs;kubeadmconfigs/status,verbs=get;list;watch;create;update;patch;delete -// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch;create;update;patch;delete -``` - -A bootstrap provider may also need RBAC permissions for other types, such as `Cluster`. If you need -read-only access, you can limit the permissions to `get`, `list`, and `watch`. The following -configuration can be used for retrieving `Cluster` resources: - -``` -// +kubebuilder:rbac:groups=cluster.x-k8s.io,resources=clusters;clusters/status,verbs=get;list;watch -``` - -### Cluster API controllers - -The Cluster API controller for `Machine` resources is configured with full read/write RBAC permissions for all resources -in the `bootstrap.cluster.x-k8s.io` API group. This group represents all bootstrap providers for SIG Cluster -Lifecycle-sponsored provider subprojects. If you are writing a provider not sponsored by the SIG, you must add new RBAC -permissions for the Cluster API `manager-role` role, granting it full read/write access to the bootstrap resource in -your API group. - -Note, the write permissions allow the `Machine` controller to set owner references and labels on the bootstrap -resources; they are not used for general mutations of these resources. +[cloud-init]: https://cloudinit.readthedocs.io/en/latest/ +[All resources: Scope]: #all-resources-scope +[All resources: `TypeMeta` and `ObjectMeta`field]: #all-resources-typemeta-and-objectmeta-field +[All resources: `APIVersion` field value]: #all-resources-apiversion-field-value +[aggregation label]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#aggregated-clusterroles +[Kubernetes API Deprecation Policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ +[BootstrapConfig, BootstrapConfigList resource definition]: #bootstrapconfig-bootstrapconfiglist-resource-definition +[BootstrapConfig: data secret]: #bootstrapconfig-data-secret +[BootstrapConfig: initialization completed]: #bootstrapconfig-initialization-completed +[Improving status in CAPI resources]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md +[BootstrapConfig: conditions]: #bootstrapconfig-conditions +[Kubernetes API Conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties +[Cluster API condition proposal]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md +[BootstrapConfig: terminal failures]: #bootstrapconfig-terminal-failures +[BootstrapConfigTemplate, BootstrapConfigTemplateList resource definition]: #bootstrapconfigtemplate-bootstrapconfigtemplatelist-resource-definition +[BootstrapConfigTemplate: support for SSA dry run]: #bootstrapconfigtemplate-support-for-ssa-dry-run +[Sentinel file]: #sentinel-file +[Taint Nodes at creation]: #taint-nodes-at-creation +[Support for running multiple instances]: #support-for-running-multiple-instances +[Support running multiple instances of the same provider]: ../../core/support-multiple-instances.md +[Clusterctl support]: #clusterctl-support +[clusterctl provider contract]: clusterctl.md +[implementation best practices]: ../best-practices.md +[Server Side Apply]: https://kubernetes.io/docs/reference/using-api/server-side-apply/ +[the DockerMachineTemplate webhook]: https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/internal/webhooks/dockermachinetemplate_webhook.go diff --git a/docs/book/src/developer/providers/contracts/infra-cluster.md b/docs/book/src/developer/providers/contracts/infra-cluster.md index 51242cd5d88f..6e5cb0cc5ca1 100644 --- a/docs/book/src/developer/providers/contracts/infra-cluster.md +++ b/docs/book/src/developer/providers/contracts/infra-cluster.md @@ -12,7 +12,7 @@ and the interaction between the Cluster's controller and the InfraCluster resour rules defined in this page. Once contract rules are satisfied by an InfraCluster implementation, other implementation details -could be addressed according to the specific needs (Cluster API in not prescriptive). +could be addressed according to the specific needs (Cluster API is not prescriptive). Nevertheless, it is always recommended to take a look at Cluster API controllers, in-tree providers, other providers and use them as a reference implementation (unless custom solutions are required @@ -236,7 +236,7 @@ type APIEndpoint struct { ``` Once `spec.controlPlaneEndpoint` is set on the InfraCluster resource and the [InfraCluster initialization completed], -the Cluster controller will bubble up this info in Cluster's `spec.controlPlaneEndpoint`. +the Cluster controller will surface this info in Cluster's `spec.controlPlaneEndpoint`. If instead you are developing an infrastructure provider which is NOT responsible to provide a control plane endpoint, the implementer should exit reconciliation until it sees Cluster's `spec.controlPlaneEndpoint` populated. @@ -262,7 +262,7 @@ type FooClusterStatus struct { - `attributes map[string]string`: arbitrary attributes for users to apply to a failure domain. Once `status.failureDomains` is set on the InfraCluster resource and the [InfraCluster initialization completed], -the Cluster controller will bubble up this info in Cluster's `status.failureDomains`. +the Cluster controller will surface this info in Cluster's `status.failureDomains`. ### InfraCluster: initialization completed @@ -364,7 +364,7 @@ type FooClusterStatus struct { ``` Once `status.failureReason` and `status.failureMessage` are set on the InfraCluster resource, the Cluster "core" controller -will bubble up those info in the corresponding fields in Cluster's `status`. +will surface those info in the corresponding fields in Cluster's `status`. Please note that once failureReason/failureMessage is set in Cluster's `status`, the only way to recover is to delete and recreate the Cluster (it is a terminal failure). diff --git a/docs/book/src/developer/providers/contracts/infra-machine.md b/docs/book/src/developer/providers/contracts/infra-machine.md index a33a597b6d09..16dd622ab4a3 100644 --- a/docs/book/src/developer/providers/contracts/infra-machine.md +++ b/docs/book/src/developer/providers/contracts/infra-machine.md @@ -12,7 +12,7 @@ and the interaction between the Machine's controller and the InfraMachine resour rules defined in this page. Once contract rules are satisfied by an InfraMachine implementation, other implementation details -could be addressed according to the specific needs (Cluster API in not prescriptive). +could be addressed according to the specific needs (Cluster API is not prescriptive). Nevertheless, it is always recommended to take a look at Cluster API controllers, in-tree providers, other providers and use them as a reference implementation (unless custom solutions are required @@ -215,7 +215,7 @@ type FooMachineSpec struct { ``` Once `spec.providerID` is set on the InfraMachine resource and the [InfraMachine initialization completed], -the Cluster controller will bubble up this info in Machine's `spec.providerID`. +the Cluster controller will surface this info in Machine's `spec.providerID`. ### InfraMachine: failure domain @@ -275,7 +275,7 @@ type FooMachineStatus struct { Each MachineAddress must have a type; accepted types are `Hostname`, `ExternalIP`, `InternalIP`, `ExternalDNS` or `InternalDNS`. Once `status.addresses` is set on the InfraMachine resource and the [InfraMachine initialization completed], -the Machine controller will bubble up this info in Machine's `status.addresses`. +the Machine controller will surface this info in Machine's `status.addresses`. ### InfraMachine: initialization completed @@ -377,7 +377,7 @@ type FooMachineStatus struct { ``` Once `status.failureReason` and `status.failureMessage` are set on the InfraMachine resource, the Machine "core" controller -will bubble up those info in the corresponding fields in Machine's `status`. +will surface those info in the corresponding fields in Machine's `status`. Please note that once failureReason/failureMessage is set in Machine's `status`, the only way to recover is to delete and recreate the Machine (it is a terminal failure). diff --git a/docs/book/src/developer/providers/security-guidelines.md b/docs/book/src/developer/providers/security-guidelines.md index d27c2a2d00d3..a2719de6b3a0 100644 --- a/docs/book/src/developer/providers/security-guidelines.md +++ b/docs/book/src/developer/providers/security-guidelines.md @@ -6,6 +6,7 @@ There are several critical areas that any infrastructure provider implementer mu - **Ensuring secure access to VMs** for troubleshooting, with proper authentication methods. - **Controlling manual operations** performed on cloud infrastructure targeted by the provider. - **Housekeeping** of the cloud infrastructure, ensuring timely cleanup and garbage collection of unused resources. +- **Securing Machine's bootstrap data** ensuring protection oversensitive data that might be included in it. The following list outlines high-level security recommendations. It is a community-maintained resource, and everyone’s contributions are essential to continuously improve and adapt these best practices. Each provider implementer is responsible for translating these recommendations to fit the context of their specific cloud provider: @@ -23,3 +24,6 @@ The following list outlines high-level security recommendations. It is a communi 5. **Resource Housekeeping**: Any cloud resource not linked to a cluster after a fixed configurable period, created by cloud credentials, should be automatically deleted or marked for garbage collection to avoid resource sprawl. + +6. **Securing Machine's bootstrap data**: + Bootstrap data are usually stored in machine's metadata, and they might contain sensitive data, like e.g. Cluster secrets, user credentials, ssh certificates etc. It is important to ensure protections of those metadata, or if not possible, to clean up them immediately after machine bootstrap. diff --git a/docs/book/src/images/bootstrap-controller.plantuml b/docs/book/src/images/machine-phases.plantuml similarity index 100% rename from docs/book/src/images/bootstrap-controller.plantuml rename to docs/book/src/images/machine-phases.plantuml diff --git a/docs/book/src/images/bootstrap-controller.png b/docs/book/src/images/machine-phases.png similarity index 100% rename from docs/book/src/images/bootstrap-controller.png rename to docs/book/src/images/machine-phases.png