Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Commit

Permalink
operator: distributed provisioning, scheduler extensions
Browse files Browse the repository at this point in the history
This finishes the work for distributed provisioning by also supporting
it in the operator and cleaning up testing.

Because the controller part of PMEM-CSI is only useful when scheduler
extensions are desired, new API fields in the PmemCSIDeployment CRD
now control how the controller, the scheduler service and the webhook
configuration are created.

Using a node port for the scheduler service is a workaround which we
need for testing with the kubeadm-based cluster, but it shouldn't be
the default, so it gets patched in when deploying instead of having it
in the scheduler-service.yaml file.
  • Loading branch information
pohly committed Jan 15, 2021
1 parent dbc8960 commit 74f44f1
Show file tree
Hide file tree
Showing 29 changed files with 1,016 additions and 866 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ $(KUSTOMIZE_OUTPUT): _work/kustomize $(KUSTOMIZE_INPUT)
fi

kustomize: _work/go-bindata clean_kustomize_output $(KUSTOMIZE_OUTPUT)
$< -o deploy/bindata_generated.go -pkg deploy deploy/kubernetes-*/*/pmem-csi.yaml
$< -o deploy/bindata_generated.go -pkg deploy deploy/kubernetes-*/*/pmem-csi.yaml deploy/kustomize/webhook/webhook.yaml deploy/kustomize/scheduler/scheduler-service.yaml

clean_kustomize_output:
rm -rf deploy/kubernetes-*
Expand Down
68 changes: 60 additions & 8 deletions deploy/bindata_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 25 additions & 1 deletion deploy/crd/pmem-csi.intel.com_pmemcsideployments.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ spec:
properties:
controllerDriverResources:
description: ControllerDriverResources Compute resources required
by driver container running on master node
by central driver container
properties:
limits:
additionalProperties:
Expand All @@ -82,6 +82,12 @@ spec:
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
controllerTLSSecret:
description: ControllerTLSSecret is the name of a secret which contains
ca.crt, tls.crt and tls.key data for the scheduler extender and
pod mutation webhook. A controller is started if (and only if) this
secret is specified.
type: string
deviceMode:
description: DeviceMode to use to manage PMEM devices.
enum:
Expand Down Expand Up @@ -112,6 +118,15 @@ spec:
logLevel:
description: LogLevel number for the log verbosity
type: integer
mutatePods:
description: MutatePod defines how a mutating pod webhook is configured
if a controller is started. The field is ignored if the controller
is not enabled. The default is "Try".
enum:
- Always
- Try
- Never
type: string
nodeDriverResources:
description: NodeDriverResources Compute resources required by driver
container running on worker nodes
Expand Down Expand Up @@ -214,6 +229,15 @@ spec:
to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
type: object
type: object
schedulerNodePort:
description: SchedulerNodePort, if non-zero, ensures that the "scheduler"
service is created as a NodeService with that fixed port number.
Otherwise that service is created as a cluster service. The number
must be from the range reserved by Kubernetes for node ports. This
is useful if the kube-scheduler cannot reach the scheduler extender
via a cluster service.
format: int32
type: integer
type: object
status:
description: DeploymentStatus defines the observed state of Deployment
Expand Down
7 changes: 7 additions & 0 deletions deploy/kustomize/operator/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ rules:
- ""
resources:
- pods
- secrets
verbs:
- get
---
Expand Down Expand Up @@ -77,6 +78,12 @@ rules:
- pmemcsideployments/finalizers
verbs:
- '*'
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
verbs:
- '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
7 changes: 7 additions & 0 deletions deploy/operator/pmem-csi-operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ rules:
- ""
resources:
- pods
- secrets
verbs:
- get
---
Expand Down Expand Up @@ -91,6 +92,12 @@ rules:
- pmemcsideployments/finalizers
verbs:
- '*'
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
Expand Down
2 changes: 1 addition & 1 deletion deploy/yamls.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func init() {
for _, file := range AssetNames() {
parts := re.FindStringSubmatch(file)
if parts == nil {
panic(fmt.Sprintf("unexpected deployment asset: %s", file))
continue
}
kubernetes, err := version.Parse(parts[1])
if err != nil {
Expand Down
19 changes: 13 additions & 6 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,15 +292,22 @@ components that help with pod scheduling:

### Scheduler extender

When a pod requests the special [extended
When a pod requests a special [extended
resource](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#extended-resources)
called `pmem-csi.intel.com/scheduler`, the Kubernetes scheduler calls
, the Kubernetes scheduler calls
a [scheduler
extender](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/scheduler_extender.md)
provided by PMEM-CSI with a list of nodes that a pod might run
on. This extender is implemented in the PMEM-CSI controller and
connects to node driver's metrics endpoint to check for
capacity. PMEM-CSI then filters out all nodes which currently do not
on.

The name of that special resource is `<CSI driver name>/scheduler`,
i.e. `pmem-csi.intel.com/scheduler` when the default PMEM-CSI driver
name is used. It is possible to configure one extender per PMEM-CSI
deployment because each deployment has its own unique driver name.

This extender is implemented in the PMEM-CSI controller and retrieves
metrics data from each PMEM-CSI node driver instance to filter out all
nodes which currently do not
have enough storage left for the volumes that still need to be
created. This considers inline ephemeral volumes and all unbound
volumes, regardless whether they use late binding or immediate
Expand Down Expand Up @@ -328,7 +335,7 @@ See our [implementation](http://github.com/intel/pmem-csi/tree/devel/pkg/schedul

### Pod admission webhook

Having to add `pmem-csi.intel.com/scheduler` manually is not
Having to add the `<CSI driver name>/scheduler` extended resource manually is not
user-friendly. To simplify this, PMEM-CSI provides a [mutating
admission
webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
Expand Down
Loading

0 comments on commit 74f44f1

Please sign in to comment.