Skip to content

Latest commit

 

History

History
373 lines (279 loc) · 15.8 KB

README.adoc

File metadata and controls

373 lines (279 loc) · 15.8 KB

Kubernetes discovery protocol for JGroups

KUBE_PING is a discovery protocol for JGroups cluster nodes managed by Kubernetes.

Note
Another discovery protocol, which is container-agnostic (docker, openshift, kubernetes), is DNS_PING (http://www.jgroups.org/manual5/index.html#_dns_ping). This could be considered as an alternative.

Since Kubernetes is in charge of launching nodes, it knows the IP addresses of all pods it started, and is therefore the best place to ask for cluster discovery.

Discovery is therefore done by asking Kubernetes for a list of IP addresses of all cluster nodes.

Combined with bind_port / port_range, the protocol will then send a discovery request to all instances and wait for the responses.

A sample configuration looks like this:

Sample KUBE_PING config
  <TCP
     bind_addr="loopback,match-interface:eth0"
     bind_port="7800"
     ...
  />
  <org.jgroups.protocols.kubernetes.KUBE_PING
     port_range="1"
     namespace="${KUBERNETES_NAMESPACE:production}"
     labels="${KUBERNETES_LABELS:cluster=nyc}"
  />
  ...

When a discovery is started, KUBE_PING asks Kubernetes for a list of the IP addresses of all pods which it launched, matching the given namespace and labels (see below).

Let’s say Kubernetes launched a cluster of 3 pods with IP addresses 172.17.0.2, 172.17.0.3 and 172.17.0.5 (all launched into the same namespace and without any (or the same) labels).

On a discovery request, Kubernetes returns a list of 3 IP addresses. JGroups now knows that the ports have been allocated from range [7800 .. 7801] (bind_port .. (bind_port + port_range) in TCP).

KUBE_PING therefore sends discovery requests to members at addresses 172.17.0.2:7800, 172.17.0.2:7801, 172.17.0.3:7800, 172.17.0.3:7801, 172.17.0.5:7800 and 172.17.0.5:7801.

Note
Clients need permission to ask Kubernetes for a list of pod IPs. This is done by creating various policies e.g. with YAML. See the 3 example policies at the top of the Demo section below.

Separating different clusters

If pods with containers in different clusters are launched, we’d get a list of IP addresses of all nodes, not just the ones in the same cluster, as Kubernetes knows nothing about clusters.

If we start multiple clusters, we need to separate them using namespaces and/or labels. If no namespaces or labels were used, things would still work, but we’d see warning messages in the logs about messages from different clusters that were discarded.

Namespaces

Namespaces can be used to separate deployments, services and pods. The example below shows namespaces default (the default namespace) and kube-system (the name space used by Kubernetes for its own deployments, pods etc):

[belasmac] /Users/bela/IspnPerfTest$ kubectl get services,deployments,pods --all-namespaces
NAMESPACE     NAME                       CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
default       svc/ispn-perf-test         10.0.0.137   <nodes>       8090:32700/TCP   8h
default       svc/kubernetes             10.0.0.1     <none>        443/TCP          53d
kube-system   svc/kube-dns               10.0.0.10    <none>        53/UDP,53/TCP    53d
kube-system   svc/kubernetes-dashboard   10.0.0.30    <nodes>       80:30000/TCP     53d

NAMESPACE   NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default     deploy/ispn-perf-test2   3         3         3            3           8h

NAMESPACE     NAME                                  READY     STATUS    RESTARTS   AGE
default       po/ispn-perf-test2-2831437780-g07c5   1/1       Running   1          8h
default       po/ispn-perf-test2-2831437780-np83d   1/1       Running   0          6h
default       po/ispn-perf-test2-2831437780-szl60   1/1       Running   1          8h
kube-system   po/kube-addon-manager-minikube        1/1       Running   12         53d
kube-system   po/kube-dns-v20-b63lk                 3/3       Running   34         53d
kube-system   po/kubernetes-dashboard-73c7v         1/1       Running   11         53d

The last section shows 6 pods that have been started, 3 in the default namespace and 3 by Kubernetes itself in the kube-system namespace. The following 3 namespaces exist:

[belasmac] /Users/bela$ kubectl get namespaces
NAME          STATUS    AGE
belaban       Active    2d
default       Active    53d
kube-system   Active    53d

To create a new namespace foo, run kubectl create namespace foo.

Namespaces have to be created and deleted manually via kubectl.

When launching a new deployment in a pod, the namespace can be given with -n <name space> or --namespace=<name space>. Alternatively, the namespace can be defined in the YAML or JSON config passed to kubectl create -f <config>.

In the JGroups configuration, attribute namespace is used to define the namespace to be used for discovery. The example above used namespace="${KUBERNETES_NAMESPACE:production}". This means that the namespace is

  • the value of system property KUBERNETES_NAMESPACE, if set

  • the value of environment variable KUBERNETES_NAMESPACE if set, or

  • "production" if neither system property nor env var are set

In the example above, Kubernetes will return only IP addresses of pods created in namespace "production".

Labels

Labels are similar to namespaces; different labels can separate clusters running inside the same namespace.

Similarly to namespaces, labels also have to be defined when launching a pod, either via --labels=<label> passed to kubectl run, or in the YAML or JSON configuration file passed to kubectl create -f <config>.

In the sample configuration above, labels were defined as labels="${KUBERNETES_LABELS:cluster=nyc}". This means that Kubernetes will only return IP addresses of pods started with label cluster=nyc.

Namespaces and labels can be both set; in this case, Kubernetes will return the IP addresses of all pods started in the given namespace matching the given label(s).

If neither namespace nor labels are set, then the IP addresses of all pods created by Kubernetes in the default namespace "default" will be returned.

KUBE_PING configuration

Attribute name System property Default Description

port_range

1

Number of additional ports to be probed for membership. A port_range of 0 does not probe additional ports. Example: initial_hosts=A[7800] port_range=0 probes A:7800, port_range=1 probes A:7800 and A:7801

connectTimeout

KUBERNETES_CONNECT_TIMEOUT

5000

Maximum time (in milliseconds) to wait for a connection to the Kubernetes server. If exceeded, an exception will be thrown.

readTimeout

KUBERNETES_READ_TIMEOUT

30000

Maximum time in milliseconds to wait for a response from the Kubernetes server.

operationAttempts

KUBERNETES_OPERATION_ATTEMPTS

3

Maximum number of attempts to send discovery requests.

operationSleep

KUBERNETES_OPERATION_SLEEP

1000

Time in milliseconds between operation attempts.

masterProtocol

KUBERNETES_MASTER_PROTOCOL

https

Schema http or https to be used to send the initial discovery request to the Kubernetes server.

masterHost

KUBERNETES_SERVICE_HOST

The URL of the Kubernetes server.

masterPort

KUBERNETES_SERVICE_PORT

The port on which the Kubernetes server is listening.

apiVersion

KUBERNETES_API_VERSION

v1

The version of the protocol to the Kubernetes server.

namespace

KUBERNETES_NAMESPACE

default

The namespace to be used.

labels

KUBERNETES_LABELS

The labels to use in the discovery request to the Kubernetes server.

clientCertFile

KUBERNETES_CLIENT_CERTIFICATE_FILE

Certificate to access the Kubernetes server.

clientKeyFile

KUBERNETES_CLIENT_KEY_FILE

Client key file (store).

clientKeyPassword

KUBERNETES_CLIENT_KEY_PASSWORD

The password to access the client key store.

clientKeyAlgo

KUBERNETES_CLIENT_KEY_ALGO

RSA

The algorithm used by the client.

caCertFile

KUBERNETES_CA_CERTIFICATE_FILE

/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

Client CA certificate.

saTokenFile

SA_TOKEN_FILE

/var/run/secrets/kubernetes.io/serviceaccount/token

Token file.

dump_requests

false

Dumps all discovery requests and responses to the Kubernetes server to stdout when true.

split_clusters_during_rolling_update

KUBERNETES_SPLIT_CLUSTERS_DURING_ROLLING_UPDATE

false

During the Rolling Update, prevents from putting all Pods into a single cluster.

useNotReadyAddresses

KUBERNETES_USE_NOT_READY_ADDRESSES

true

True if initial discovery should take unready Pods into consideration.

Demo

In this demo, we’re going to let Kubernetes start 3 instances of IspnPerfTest via a YAML configuration. Then we’ll run a separate instance interactively and confirm that the instances have formed a cluster of 4. All instances are created in the default namespace and no labels are used.

Copy n' paste the snippet below in a terminal where kubectl is running against your K8S cluster

# ---------------------------------------------------------------------
# This demo assumes that RBAC is enabled on the Kubernetes cluster.
#
# The serviceaccount, clusterrole and clusterrolebinding provide
# permission for the pods to query K8S api
# ---------------------------------------------------------------------

# Change to a Kubernetes namespace of your preference
export TARGET_NAMESPACE=default

kubectl create serviceaccount jgroups-kubeping-service-account -n $TARGET_NAMESPACE

cat <<EOF | kubectl apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: jgroups-kubeping-pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: jgroups-kubeping-api-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: jgroups-kubeping-pod-reader
subjects:
- kind: ServiceAccount
  name: jgroups-kubeping-service-account
  namespace: $TARGET_NAMESPACE

---

apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
    name: ispn-perf-test
    namespace: $TARGET_NAMESPACE
  spec:
    replicas: 3
    selector:
      matchLabels:
        run: ispn-perf-test
    template:
      metadata:
        labels:
          run: ispn-perf-test
      spec:
        serviceAccountName: jgroups-kubeping-service-account
        containers:
        - args:
          - /opt/ispn/IspnPerfTest/bin/kube.sh
          - -nohup
          env:
          - name: KUBERNETES_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          image: belaban/ispn_perf_test
          name: ispn-perf-test
          resources: {}
          terminationMessagePath: /dev/termination-log
kind: List
metadata: {}

EOF

To remove the resources when demo time is over:

kubectl delete deployment/ispn-perf-test clusterrolebinding/jgroups-kubeping-api-access clusterrole/jgroups-kubeping-pod-reader serviceaccount/jgroups-kubeping-service-account -n $TARGET_NAMESPACE

The image is belaban/ispn_perf_test which contains the IspnPerfTest project plus some scripts to start nodes. 3 instances are started and the start command is kube-debug.sh -nohup; this launches the programs without the loop which reads commands from stdin.

kubectl get pods confirms that 3 instances have been created:

belasmac] /Users/bela/kubetest$ kubectl get pods
NAME                              READY     STATUS    RESTARTS   AGE
ispn-perf-test-2224433472-6l456   1/1       Running   0          29s
ispn-perf-test-2224433472-ksh58   1/1       Running   0          29s
ispn-perf-test-2224433472-rlr0m   1/1       Running   0          29s

We can now run a shell in one of the nodes and confirm that a cluster of 3 has formed. First, we have to exec a bash shell in one of the 3 nodes:

[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-perf-test-2224433472-rlr0m bash
bash-4.3$

Now probe can be used to list all cluster members:

bash-4.3$ cd IspnPerfTest/
bash-4.3$ bin/probe.sh
-- sending probe request to /224.0.75.75:7500

#1 (300 bytes):
local_addr=ispn-perf-test-2224433472-rlr0m-12151
physical_addr=172.17.0.5:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#2 (299 bytes):
local_addr=ispn-perf-test-2224433472-ksh58-1200
physical_addr=172.17.0.6:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

#3 (300 bytes):
local_addr=ispn-perf-test-2224433472-6l456-41832
physical_addr=172.17.0.7:7800
view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)

3 responses (3 matches, 0 non matches)

As can be seen, every member has the same view ispn-perf-test-2224433472-ksh58-1200|2] (3) containing 3 members, so the cluster has formed correctly.

Now a fourth instance can be created, but this time we’ll enable the event loop reading from stdin. To this end, we have to use kubectl run -it (-it for interactively):

[belasmac] /Users/bela/kubetest$ kubectl run ispn -it --rm=true --image=belaban/ispn_perf_test kube.sh
Waiting for pod default/ispn-3105267510-nr9dp to be running, status is Pending, pod ready: false
If you don't see a command prompt, try pressing enter.

-------------------------------------------------------------------
GMS: address=ispn-3105267510-nr9dp-29942, cluster=default, physical address=172.17.0.8:7800
-------------------------------------------------------------------

-------------------------------------------------------------------
GMS: address=ispn-3105267510-nr9dp-43008, cluster=cfg, physical address=172.17.0.8:7900
-------------------------------------------------------------------
created 100,000 keys: [1-100,000], old key set size: 0
Fetched config from ispn-perf-test-2224433472-ksh58-51617: {print_details=true, num_threads=100, print_invokers=false, num_keys=100000, time_secs=60, msg_size=1000, read_percentage=1.0}
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true)  [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all

This starts the instance and it should have joined the cluster, which should now have 4 nodes. This can be confirmed by running probe.sh again in the other shell, or by pressing [2] View):

2

-- local: ispn-3105267510-nr9dp-43008
-- view: [ispn-perf-test-2224433472-ksh58-51617|3] (4) [ispn-perf-test-2224433472-ksh58-51617, ispn-perf-test-2224433472-rlr0m-11878, ispn-perf-test-2224433472-6l456-28251, ispn-3105267510-nr9dp-43008]

We can see that the view is now ispn-perf-test-2224433472-ksh58-51617|3] (4), and the cluster has correctly added the fourth member.

Running on Google Container Engine

The commands for running on Google Container Engine (GKE) are the same as when running locally in minikube.

The only difference is that on GKE, contrary to minikube, IP multicasting is not available. This means that the probe.sh command has to be run as probe.sh -addr localhost instead of simply running probe.sh.