Skip to content

Deploying CaaSP CAP on ECP

Carla Schroder edited this page Jan 9, 2019 · 24 revisions

I spent many days testing different deployment configurations on ECP, because ever since the introduction of manually configuring pod security policies (PSPs) I have not had a successful deployment, and we doc peeps are getting conflicting information. We must have accurate information to write up for our customers.

The basis for my testing is Setup CAP on CaaSP on ECP. Prabal's scripts automate creating an NFS storage class and applying PSPs. I forked SUSE/cf-ci to test various PSP configurations. The steps on this page create a successful deployment using the original SUSE/cf-ci configurations. The PSPs do not make sense to me as it seems the end result is akin to mode 0777, but at least I get a working deployment.

I tried upgrading CaaSP 3.0 (transactional-update up), and my CAP deployments failed. I do not know why, and have not had time yet to test different PSP configurations. So the following steps are on the stock CaaSP 3.0 GMC image in ECP.

This is all fragile, and fixing a damaged deployment is difficult, so the CAP Guides must present exact perfect steps for customers to get it right the first time.

All commands are run on your workstation, except for the commands that are run on your Kube master node.

  1. Create a CaaSP cluster on ECP:
$ git clone https://github.com/prabalsharma/automation.git
$ cd automation/caasp-openstack-heat

Edit heat-environment.yaml.example with your DNS server and desired internal Kube cluster network range. Do not overlap with the CaaSP defaults of 172.16.0.0/13 and 172.24.0.0/16. Current usable DNS servers are 10.84.2.20, 10.84.2.21, and 10.84.100.100.

This is my heat-environment.yaml.example file:

---
parameters:
  root_password: password
  admin_flavor: m1.large
  master_flavor: m1.xlarge
  worker_flavor: m1.xlarge
  external_net: floating
  internal_net_cidr: 172.24.8.0/24
  dns_nameserver: 10.84.100.100
  worker_num_volumes: 0
  worker_volume_size: 60

Create your cluster with this command:

./caasp-openstack --build -m 1 -w 3 --openrc <path to your ECP openrc.sh> --image CaaSP-3.0.0-GMC --name <your stack name>
  1. When you see Velum started!, open a Web browser to the floating IP address assigned to the admin node + omg.howdoi.website, e.g. https://10.86.2.234.omg.howdoi.website. Use the admin node internal IP address for the internal dashboard location address, and check the box to install Tiller. Continue through the screens for selecting nodes, and on the screen for configuring the External Kubernetes API FQDN and External Dashboard FQDN, use the master and admin floating IP addresses + omg.howdoi.website, e.g. 10.86.2.234.omg.howdoi.website and 10.86.2.119.omg.howdoi.website, then bootstrap the cluster.

  2. After the new CaaSP cluster has bootstrapped, download your Kubeconfig file and verify that you can connect to the cluster with kubectl get nodes. Then apply the SUSE/cf-ci scripts to set up PSPs and create an NFS storage class on the worker nodes by opening an SSH session to your master node, cloning the cf-ci repo, and running the cluster prep script:

# ssh root@<master-ip>
password: password
# git clone https://github.com/SUSE/cf-ci.git
# bash cf-ci/automation-scripts/prep-new-cluster.sh

Run kubectl get storageclass and kubectl get psp to verify.

This is the cap-psp-rbac.yaml file applied by prep-new-cluster.sh:

---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  name: suse.cap.psp
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
spec:
  # Privileged
  #privileged: false      	<<< default in suse.caasp.psp.unprivileged
  privileged: true
  # Volumes and File Systems
  volumes:
    # Kubernetes Pseudo Volume Types
    - configMap
    - secret
    - emptyDir
    - downwardAPI
    - projected
    - persistentVolumeClaim
    # Networked Storage
    - nfs
    - rbd
    - cephFS
    - glusterfs
    - fc
    - iscsi
    # Cloud Volumes
    - cinder
    - gcePersistentDisk
    - awsElasticBlockStore
    - azureDisk
    - azureFile
    - vsphereVolume
  allowedFlexVolumes: []
  allowedHostPaths:
    # Note: We don't allow hostPath volumes above, but set this to a path we
    # control anyway as a belt+braces protection. /dev/null may be a better
    # option, but the implications of pointing this towards a device are
    # unclear.
    - pathPrefix: /opt/kubernetes-hostpath-volumes
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  #allowPrivilegeEscalation: false	   <<< default in suse.caasp.psp.unprivileged
  allowPrivilegeEscalation: true
  #defaultAllowPrivilegeEscalation: false  <<< default in suse.caasp.psp.unprivileged
  # Capabilities
  allowedCapabilities: []
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: false
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unsed in CaaSP
    rule: 'RunAsAny'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: suse:cap:psp
rules:
  - apiGroups: ['extensions']
    resources: ['podsecuritypolicies']
    verbs: ['use']
    resourceNames: ['suse.cap.psp']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cap:clusterrole
roleRef:
  kind: ClusterRole
  name: suse:cap:psp
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: default
  namespace: uaa
- kind: ServiceAccount
  name: default
  namespace: scf
- kind: ServiceAccount
  name: default
  namespace: stratos
- kind: ServiceAccount
  name: default
  namespace: pg-sidecar
- kind: ServiceAccount
  name: default
  namespace: mysql-sidecar
# Workaround test-brain serviceaccount psp issue for brains tests.
# We should remove the line which checks for this in run-test when we have a better
# way of adding the appropriate permissions to the brain tests
- kind: ServiceAccount
  name: test-brain
namespace: scf
  1. Configuring the CAP deployment

In ECP, associate a floating IP address to one of your worker nodes, which in the following examples is 10.86.1.44. In a real production deployment there would be a load balancer or ingress controller and a real DNS/DHCP server. For quick testing use the IP address of a worker node. Then use this as your domain address in your scf-config-value.yaml file.

After months of confusion, I finally figured out a working configuration for external_ips:. This is the internal worker IP addresses that expose services externally. Use the internal IP addresses of your worker nodes, and also enter your domain IP address, which is required for Stratos.

env:    
    DOMAIN: 10.86.1.44.omg.howdoi.website
    UAA_HOST: uaa.10.86.1.44.omg.howdoi.website
    UAA_PORT: 2793
    
kube:
    external_ips: ["10.86.1.44", "172.24.8.6", "172.24.8.24", "172.24.8.15"]
    
    storage_class: 
        persistent: "persistent"
        shared: "shared"
    
    registry: 
        hostname: "registry.suse.com"
        username: ""
        password: ""
    organization: "cap"
    
    auth: rbac    
    psp:
        privileged: "suse.cap.psp"
    
secrets:
    # Create a password for your CAP cluster
    CLUSTER_ADMIN_PASSWORD: password
    
    # Create a password for your UAA client secret
    UAA_ADMIN_CLIENT_SECRET: password

I left out the ["SYS_RESOURCE"] entries because of failed deployments.

  1. Deploy UAA, SCF, and Stratos

Run the following commands to deploy CAP. Wait for each one to successfully complete before going to the next one. The only pods that can appear as 0/1 Completed are
secret-generation-* and post-deployment-setup-*. All others must be 1/1 Running

For Stratos, you must first create a new TCP 8443 rule for the worker security group in ECP.

$ helm install suse/uaa \
--name susecf-uaa \
--namespace uaa \
--values scf-config-values.yaml

$ watch -c 'kubectl get pods --namespace uaa'
  ##press ctrl+c to stop

$ SECRET=$(kubectl get pods --namespace uaa \
-o jsonpath='{.items[?(.metadata.name=="uaa-0")].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}')

$ CA_CERT="$(kubectl get secret $SECRET --namespace uaa \
-o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"

$ helm install suse/uaa \
--name susecf-uaa \
--namespace uaa \
--values scf-config-values.yaml

$ watch -c 'kubectl get pods --namespace scf'
  ##press ctrl+c to stop

$ helm install suse/console \
--name susecf-console \
--namespace stratos \
--values scf-config-values.yaml

$ watch -c 'kubectl get pods --namespace stratos'
  ##press ctrl+c to stop

I have heard discussion that the PSPs should be applied automatically via Helm charts, rather than requiring the CAP admin to apply them manually. Until this is resolved they must be applied manually, either by using the cf-ci scripts, or per the instructions in the CAP Guides. What do I tell customers?

Clone this wiki locally