Skip to content

ccremer/kubernetes-zfs-provisioner

Repository files navigation

Dynamic ZFS provisioner for Kubernetes

Build Go version Version GitHub downloads

kubernetes-zfs-provisioner is a dynamic ZFS persistent volume provisioner for Kubernetes. It creates ZFS datasets via SSH on remote hosts and shares them via NFS to make them mountable to pods.

architecture with NFS

Alternatively, if the ZFS hosts are part of the cluster, HostPath is also possible, but the PersistentVolume objects will have a NodeAffinity configured.

architecture with Hostpath

As a third option, if the ZFS host is part of the cluster, you can let the provisioner choose whether NFS or HostPath is used with the Auto mode. If the requested AccessModes in the Persistent Volume Claim contains ReadWriteOnce (the volume can only be accessed by pods running on the same node), or ReadWriteOncePod (the volume can only be accessed by one single Pod at any time), then HostPath will be used and the NodeAffinity will be configured on the PersistentVolume objects so the scheduler will automatically place the corresponding Pods onto the ZFS host. Otherwise NFS will be used and NodeAffinity will not be set. If multiple (exclusive) AccessModes are given, NFS takes precedence.

Currently all ZFS attributes are inherited from the parent dataset.

For more information about external storage in kubernetes, see kubernetes-sigs/sig-storage-lib-external-provisioner.

Installation

Recommended option is via Helm

Configuration

The provisioner relies on an already set up Zpool and a dataset by the administrator. It also needs SSH access to the target ZFS hosts, i.e. mount the SSH private key and config to the container so that the executing user can find it.

Provisioner

By default the container image should work out of the box when installed in the cluster. The only thing to configure is SSH, the Helm Chart should help you with that.

The provisioner can be configured via the following environment variables:

Variable Description Default
ZFS_METRICS_PORT Port on which to export Prometheus metrics. 8080
ZFS_METRICS_ADDR Interface binding address on which to export Prometheus metrics. 0.0.0.0
ZFS_KUBE_CONFIG_PATH Kubeconfig file path in which the credentials and API URL are defined. ``
ZFS_PROVISIONER_INSTANCE The instance name needs to be unique if multiple provisioners are deployed. pv.kubernetes.io/zfs

The provisioner instance name is also stored as a ZFS user property in the created dataset of the form io.kubernetes.pv.zfs:managed_by for system administrators, but is not further significant to the provisioner.

Storage Classes

The provisioner relies on properly configured storage classes. The following shows an example for the HostPath type.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: zfs-hostpath
provisioner: pv.kubernetes.io/zfs
reclaimPolicy: Delete
parameters:
  parentDataset: tank/kubernetes
  hostname: storage-1.domain.tld
  type: hostpath
  node: storage-1 # the kubernetes.io/hostname label if different than hostname parameter (optional)
  reserveSpace: true

Following example configures a storage class for ZFS over NFS:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: zfs-nfs
provisioner: pv.kubernetes.io/zfs
reclaimPolicy: Retain
parameters:
  parentDataset: tank/kubernetes
  hostname: storage-1.domain.tld
  type: nfs
  shareProperties: rw,no_root_squash # no_root_squash by default sets mode to 'ro'
  reserveSpace: true

For NFS, you can also specify other options, as described in exports(5).

The following example configures a storage class using the Auto type. The provisioner will decide whether HostPath or NFS will be used based on the AccessModess requested by the persistent volume claim.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: zfs-nfs
provisioner: pv.kubernetes.io/zfs
reclaimPolicy: Retain
parameters:
  parentDataset: tank/kubernetes
  hostname: storage-1.domain.tld
  type: auto
  node: storage-1 # the name of the node where the ZFS datasets are located.
  shareProperties: rw,no_root_squash
  reserveSpace: true

Notes

Reclaim policy

This provisioner supports the Delete and Retain reclaim policies, with Delete being default if unspecified. The reclaim policy is also stored as ZFS user property of the form io.kubernetes.pv.zfs:reclaim_policy for system administrators, but is not further significant to the provisioner.

Storage space

By default, the provisioner uses the refreservation and refquota ZFS attributes to limit storage space for volumes. Each volume can not use more storage space than the given resource request and also reserves exactly that much. To disable this and enable thin provisioning, set reserveSpace to false in your storage class parameters. Snapshots do not account for the storage space limit, however this provisioner does not do any snapshots or backups.

See zfs(8) for more information.

Security

First of all, no warranties and use at own risk.

Making a container image and creating ZFS datasets from a container is not exactly easy, as ZFS runs in kernel. While it's possible to pass /dev/zfs to a container so it can create and destroy datasets within the container, sharing the volume with NFS does not work.

Setting sharenfs property to anything other than off invokes exportfs(8), that requires also running the NFS Server to reload its exports. Which is not the case in a container (see zfs(8)).

But most importantly: Mounting /dev/zfs inside the provisioner container would mean that the datasets will only be created on the same host as the container currently runs.

So, in order to "break out" of the container the zfs calls are wrapped and redirected to another host over SSH. This requires SSH private keys to be mounted in the container for a SSH user with sufficient permissions to run zfs commands on the target host.

Example sudoers file in /etc/sudoers.d/zfs-provisioner (On the ZFS host):

zfs-provisioner ALL=(ALL) NOPASSWD:/sbin/zfs *,/bin/chmod *

For increased performance and security install ZFS on all Kubernetes nodes thats should provide ZFS storage. Then it's possible to create PersistentVolume objects with HostPath. This eliminates network latency over unencrypted NFS, but schedules the pods to the ZFS hosts only.

Development

Requirements

  • go
  • docker
  • ZFS and NFS (run make install:zfs on Debian/Ubuntu if not already installed)

Building and Testing

Run make help to see which target does what.

Troubleshooting

Filesystem created, but not shared

controller.go:920] error syncing claim "56ea786a-e376-4911-a4b1-7b040dc3537f": failed to provision volume
with StorageClass "zfs-retain-pve-1": creating ZFS dataset failed: exit status 1:
"/usr/bin/zfs zfs create -o sharenfs=rw,no_root_squash ... tank/kubernetes/
pvc-56ea786a-e376-4911-a4b1-7b040dc3537f" => cannot share 'tank/kubernetes/
pvc-56ea786a-e376-4911-a4b1-7b040dc3537f': share(1M) failed
filesystem successfully created, but not shared

This happens when the dataset got created, but invoking zfs share is failing. Most likely because from zfs(8) it's stated that exportfs(8) is invoked, which talks to the NFS server.

So, have you got nfs-kernel-server installed on the host and is exportfs available?

Once you solve this, destroy the dataset again, as the following retries will fail forever:

cannot create 'tank/services/kubernetes/pvc-56ea786a-e376-4911-a4b1-7b040dc3537f': dataset already exists

Credits

Thanks to Gentics for open sourcing the initial version!

I (@ccremer) have been allowed to take over maintenance for this repository.