From 98d681ae6025c775f655c6095b51ea9efafa12e1 Mon Sep 17 00:00:00 2001 From: xwang2713 Date: Wed, 18 Sep 2019 14:00:03 -0400 Subject: [PATCH] Create/Update README.md files --- Deployment/dp-1/README.md | 175 +++++++------- ...{esp-e1-autoscale.yaml => esp-e1-hpa.yaml} | 0 Deployment/ebs/ebs-1/README.md | 1 + Deployment/efs/efs-1/README.md | 174 ++++++++++++++ README.md | 44 +++- RelicationController/rc-1/esp-e1-sf.yaml | 40 ---- ReplicationController/rc-1/README.md | 12 + .../rc-1/admin.yaml | 0 .../rc-1/esp-e1.yaml | 0 .../rc-1/roxie-r1.yaml | 0 .../rc-1/start | 0 .../rc-1/stop | 0 .../rc-1/support.yaml | 0 .../rc-1/thor-t1.yaml | 0 .../rc-1/thormaster-t1.yaml | 0 StatefulSet/README.md | 8 + StatefulSet/ebs/ebs-1/README.md | 71 +++--- StatefulSet/efs/efs-1/README.md | 134 ++++++++++- aws/README.md | 226 +----------------- bin/cluster_run.ps1 | 78 +++--- local/APPLE.md | 0 local/LINUX.md | 12 + local/MACOS.md | 10 + local/MINIKUBE.md | 20 +- local/README.md | 29 +-- local/WINDOWS.md | 1 + security/{README => README.md} | 0 27 files changed, 587 insertions(+), 448 deletions(-) rename Deployment/dp-1/{esp-e1-autoscale.yaml => esp-e1-hpa.yaml} (100%) create mode 100644 Deployment/efs/efs-1/README.md delete mode 100644 RelicationController/rc-1/esp-e1-sf.yaml create mode 100644 ReplicationController/rc-1/README.md rename {RelicationController => ReplicationController}/rc-1/admin.yaml (100%) rename {RelicationController => ReplicationController}/rc-1/esp-e1.yaml (100%) rename {RelicationController => ReplicationController}/rc-1/roxie-r1.yaml (100%) rename {RelicationController => ReplicationController}/rc-1/start (100%) rename {RelicationController => ReplicationController}/rc-1/stop (100%) rename {RelicationController => ReplicationController}/rc-1/support.yaml (100%) rename {RelicationController => ReplicationController}/rc-1/thor-t1.yaml (100%) rename {RelicationController => ReplicationController}/rc-1/thormaster-t1.yaml (100%) create mode 100644 StatefulSet/README.md delete mode 100644 local/APPLE.md create mode 100644 local/MACOS.md rename security/{README => README.md} (100%) diff --git a/Deployment/dp-1/README.md b/Deployment/dp-1/README.md index 5f0aa05..3a11050 100644 --- a/Deployment/dp-1/README.md +++ b/Deployment/dp-1/README.md @@ -1,97 +1,110 @@ -# HPCC-Kubernetes - -## Deploy a HPCC Cluster with Kubernetes Deployment -In Kubernetes a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) is responsible for replicating sets of identical pods. Like a _Service_ it has a selector query which identifies the members of it's set. Unlike a _Service_ it also has a desired number of replicas, and it will create or delete _Pods_ to ensure that the number of _Pods_ matches up with it's desired state. - -Make sure bin/bootstrap.[sh|bat] started first - -```sh +# Deploy HPCC Systems Cluster with Deployment + +This is a simple stateless deployment scenario. It can be used to both local and real cloud, such as AWS. + +## Prerequisities +- Bootstrap + AWS: + ```console + bin/bootstrap-aws.sh + ``` + Local: + ```console + bin/bootstrap-local.sh + ``` +## Deploy HPCC Systems Cluster +```console ./start ``` -To verify the thor and roxie are ready: -```sh +To make sure they are up: +```console kubectl get pods - -NAME READY STATUS RESTARTS AGE -esp-controller-bbgqu 1/1 Running 0 3m -esp-controller-wc8ae 1/1 Running 0 3m -roxie-controller-hmvo5 1/1 Running 0 3m -roxie-controller-x7ksh 1/1 Running 0 3m -thor-controller-2sbe5 1/1 Running 0 3m -thor-controller-p1q7f 1/1 Running 0 3m +NAME READY STATUS RESTARTS AGE +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d16h +esp-esp1-69b59769bd-94gm4 1/1 Running 0 16s +hpcc-admin 1/1 Running 0 19s +roxie-roxie1-64d49d76cf-b28gh 1/1 Running 0 15s +support-778c8ffbb-p44t7 1/1 Running 0 17s +thor-thor1-75bb466cbf-skqj5 1/1 Running 0 13s +thormaster-thor1 1/1 Running 0 14s ``` -To start master instance: -```sh -kubectl create -f master-controller.yaml +The cluster should be automatically configured and started. +To verify the status +```console +bin/cluster_run.sh status +Status of esp-esp1-69b59769bd-94gm4: +mydafilesrv ( pid 981 ) is running ... +esp1 ( pid 1175 ) is running ... + +Status of roxie-roxie1-64d49d76cf-b28gh: +mydafilesrv ( pid 969 ) is running ... +roxie1 ( pid 1168 ) is running ... + +Status of support-778c8ffbb-p44t7: +mydafilesrv ( pid 1006 ) is running ... +mydali ( pid 1200 ) is running ... +mydfuserver ( pid 1413 ) is running ... +myeclagent ( pid 1629 ) is running ... +myeclccserver ( pid 1832 ) is running ... +myeclscheduler ( pid 2049 ) is running ... +mysasha ( pid 2255 ) is running ... + +Status of thor-thor1-75bb466cbf-skqj5: +mydafilesrv ( pid 962 ) is running ... + +Status of thormaster-thor1: +mydafilesrv ( pid 969 ) is running ... +thor1 ( pid 1214 ) is running with 1 slave process(es) ... +`` + +## Scale up/down +Original roxie-roxie1 cluster has 1 instances. To increase it to 4 instances: +```console +kubeclt scale --replicas 2 StatefulSet/roxie-roxie1 +NAME READY STATUS RESTARTS AGE +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d16h +esp-esp1-69b59769bd-94gm4 1/1 Running 0 3m15s +hpcc-admin 1/1 Running 0 3m18s +roxie-roxie1-64d49d76cf-b28gh 1/1 Running 0 3m14s +roxie-roxie1-64d49d76cf-mflqn 1/1 Running 0 11s +support-778c8ffbb-p44t7 1/1 Running 0 3m16s +thor-thor1-75bb466cbf-skqj5 1/1 Running 0 3m12s +thormaster-thor1 1/1 Running 0 3m13s ``` -Make sure it is up and ready: -```sh -kubectl get rc master-controller -NAME DESIRED CURRENT AGE -master-controller 1 1 12h - -kubectl get pods -NAME READY STATUS RESTARTS AGE -esp-controller-bbgqu 1/1 Running 0 5m -esp-controller-wc8ae 1/1 Running 0 5m -master-controller-wa5z8 1/1 Running 0 5m -roxie-controller-hmvo5 1/1 Running 0 5m -roxie-controller-x7ksh 1/1 Running 0 5m -thor-controller-2sbe5 1/1 Running 0 5m -thor-controller-p1q7f 1/1 Running 0 5m - - -### Access ECLWatch and Verify the cluster -Get mastr ip: -```sh -kubectl get pod master-controller-ar6jn -o json | grep podIP - "podIP": "172.17.0.5", +To scale it back +```console +kubeclt scale --replicas 1 Deployment/roxie-roxie1 ``` -If everything run OK you should access ECLWatch to verify the configuration: ```http://172.17.0.5:8010```. Again if you can't access the private ip you can try to tunnel it above described in deploy single HPCC instance. -If something go wrong you can access the master instance: -```sh -kubectl exec master-controller-ar6jn -i -t -- bash -il +## Auto-scaling +A sample autoscaling yaml file is provided. You can modify it and apply it +```console +kubectl apply -f esp-e1-hpa.yaml ``` -configuration scripts, log ile and outputs are under /tmp/ +Increase esp Pod cpu, for example run a big loop and monitor the auto-scaling. - -### Start a load balancer on esp -When deploy Kubernetes on a cloud such as AWS you can create load balancer for esp -```sh -kubectl create -f esp-service.yaml -``` -Make sure the service is up -```sh -kubectl get service -NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE -esp 10.0.21.220 a2c49b2864c79... 8001/TCP 3h -kubernetes 10.0.0.1 443/TCP 3d +The disable auto-scaling: +```console +kubectl delete -f esp-e1-hpa.yaml ``` -The "EXTERNAL-IP" is too long. -```sh -kubectl get service -o json | grep a2c49b2864c79 -"hostname": "a2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" +## Stop/Start Cluster +stop +```console +bin/cluster-run stop ``` -2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" and we define the port 8001. so 2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com:8001 should display eclwatch - - - - -### Scale thor and roxie replicated pods -For example, to add one more thor and make total 3 thor slaves: -```sh -kubectl scale rc thor --replicas=3 +start +```console +bin/cluster-run start ``` -```Note```: we need more tests on this area, particularly need restart /tmp/run_master.sh to allow re-collect pod ips, generate new environment.xml and stop/start HPCC cluster. +Get status +```console +bin/cluster-run status -### Stop and delete HPCC cluster -```sh -kubectl delete -f esp-service.yaml -kubectl delete -f thor-controller.yaml -kubectl delete -f roxie-controller.yaml -kubectl delete -f esp-controller.yaml -kubectl delete -f master-controller.yaml ``` + +## Delete Cluster ### +```console +./stop +``` \ No newline at end of file diff --git a/Deployment/dp-1/esp-e1-autoscale.yaml b/Deployment/dp-1/esp-e1-hpa.yaml similarity index 100% rename from Deployment/dp-1/esp-e1-autoscale.yaml rename to Deployment/dp-1/esp-e1-hpa.yaml diff --git a/Deployment/ebs/ebs-1/README.md b/Deployment/ebs/ebs-1/README.md index b1bdabd..0b862c4 100644 --- a/Deployment/ebs/ebs-1/README.md +++ b/Deployment/ebs/ebs-1/README.md @@ -1,3 +1,4 @@ # Deployment with EBS Only one PersistentVolumeClaim created per Deployment yaml file +Need provide each PersistentVolumeClaim in Pod. Can't dynamically creat volume and attach to scale-up Pod automatically. Use StatefulSet instead unless there are some methods we are not aware. diff --git a/Deployment/efs/efs-1/README.md b/Deployment/efs/efs-1/README.md new file mode 100644 index 0000000..919ed79 --- /dev/null +++ b/Deployment/efs/efs-1/README.md @@ -0,0 +1,174 @@ +# Deploy Dali/Sasha/DropZone/Roxie/Thor Pods as Deployment/EFS + +Generally this is prefer way to deploy cluster with EFS since typically share mode is ReadWriteMany. + +Current deployment has Sasha/DropZone in support Pod. + +Even EFS performance may not be good as EBS but EFS it is very convenient such as: +- Don't need worry about cross AZz +- Easy to share and re-use data +- Don't need to worry to delete volume after deleting Pod. + +EFS is little expensive than EBS. + +## Performance +to do (compare EFS and EBS) + +## Prerequisities +- Bootstrap + ```console + bin/bootstrap-aws.sh + ``` +- Start NFS server + in efs/ + ```console + ./apply.sh + ``` + apply.sh appl rbac.yaml and manifest.yaml + To display NFS pod: + ```console + kubectl get pods + NAME READY STATUS RESTARTS AGE + efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h + ``` + To display PV and PVC: + ```console + kubectl get pv + NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE + pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX Delete Bound default/efs aws-efs 2d15h + + kubectl get pvc + NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE + efs Bound pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX aws-efs 2d15h + + The Volume Claim name is "efs". The storage class is "aws-efs" + +## Deploy HPCC Systems Cluster +```console +./start +``` +To make sure they are up: +```console +kubectl get pods +NAME READY STATUS RESTARTS AGE +dali 1/1 Running 0 51s +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h +esp-esp1-f5bc48677-znlsv 1/1 Running 0 49s +hpcc-admin 1/1 Running 0 52s +roxie-roxie1-84f9578895-fbpld 1/1 Running 0 47s +roxie-roxie1-84f9578895-gpf6x 1/1 Running 0 47s +roxie-roxie2-6cf55ffd45-nf2mp 1/1 Running 0 46s +support-db468c5c9-8kqzd 1/1 Running 0 50s +thor-thor1-59876665f5-67p4p 1/1 Running 0 44s +thor-thor1-59876665f5-nn7wc 1/1 Running 0 44s +thormaster-thor1 1/1 Running 0 45s +``` + +The cluster should be automatically configured and started. +To verify the status +```console +bin/cluster_run.sh status +Status of dali: +mydafilesrv ( pid 972 ) is running ... +mydali ( pid 1166 ) is running ... + +Status of esp-esp1-f5bc48677-znlsv: +mydafilesrv ( pid 991 ) is running ... +esp1 ( pid 1185 ) is running ... + +Status of roxie-roxie1-84f9578895-fbpld: +mydafilesrv ( pid 978 ) is running ... +roxie1 ( pid 1177 ) is running ... + +Status of roxie-roxie1-84f9578895-gpf6x: +mydafilesrv ( pid 978 ) is running ... +roxie1 ( pid 1177 ) is running ... + +Status of roxie-roxie2-6cf55ffd45-nf2mp: +mydafilesrv ( pid 979 ) is running ... +roxie2 ( pid 1178 ) is running ... + +Status of support-db468c5c9-8kqzd: +mydafilesrv ( pid 1010 ) is running ... +mydfuserver ( pid 1204 ) is running ... +myeclagent ( pid 1413 ) is running ... +myeclccserver ( pid 1633 ) is running ... +myeclscheduler ( pid 1851 ) is running ... +mysasha ( pid 2059 ) is running ... + +Status of thor-thor1-59876665f5-67p4p: +mydafilesrv ( pid 972 ) is running ... + +Status of thor-thor1-59876665f5-nn7wc: +mydafilesrv ( pid 972 ) is running ... + +Status of thormaster-thor1: +mydafilesrv ( pid 978 ) is running ... +thor1 ( pid 1243 ) is running with 2 slave process(es) ... + +``` + + +## Access ECLWatch +Get esp public ip: +```console +kubectl get service +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +ew-esp1 LoadBalancer 10.100.248.99 a0e9629f2da3811e9b1b40aa0a5b6276-2050531242.us-east-1.elb.amazonaws.com 8010:30108/TCP 3m52s + +``` +ECLWatch URL: http://a0e9629f2da3811e9b1b40aa0a5b6276-2050531242.us-east-1.elb.amazonaws.com:8010 + +## Scale up/down +Original roxie-roxie1 cluster has 2 instances. To increase it to 4 instances: +```console +kubeclt scale --replicas 6 StatefulSet/roxie-roxie1 + +kubeclt get pods +NAME READY STATUS RESTARTS AGE +dali 1/1 Running 0 6m10s +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h +esp-esp1-f5bc48677-znlsv 1/1 Running 0 6m8s +hpcc-admin 1/1 Running 0 6m11s +roxie-roxie1-84f9578895-7p4qz 1/1 Running 0 29s +roxie-roxie1-84f9578895-8tpxp 1/1 Running 0 29s +roxie-roxie1-84f9578895-fbpld 1/1 Running 0 6m6s +roxie-roxie1-84f9578895-gpf6x 1/1 Running 0 6m6s +roxie-roxie1-84f9578895-tjlp9 1/1 Running 0 29s +roxie-roxie1-84f9578895-v62dj 1/1 Running 0 29s +roxie-roxie2-6cf55ffd45-nf2mp 1/1 Running 0 6m5s +support-db468c5c9-8kqzd 1/1 Running 0 6m9s +thor-thor1-59876665f5-67p4p 1/1 Running 0 6m3s +thor-thor1-59876665f5-nn7wc 1/1 Running 0 6m3s +thormaster-thor1 1/1 Running 0 6m4s + +``` +To scale it back +```console +kubeclt scale --replicas 2 Deployment/roxie-roxie1 +``` + + +## Stop/Start Cluster +stop +```console +bin/cluster-run stop +``` +start +```console +bin/cluster-run start +``` + +Get status +```console +bin/cluster-run status + +``` + +## Delete Cluster ### +```console +./stop +``` +This does not delete volumes. Either use AWS Client or go to EC2 console to delete them. + + diff --git a/README.md b/README.md index 7b3a148..2bbbc30 100644 --- a/README.md +++ b/README.md @@ -1,41 +1,67 @@ # HPCC-Kubernetes +## Reliable, Scaleble HPCC on Kubernetes + This repo has several HPCC Systems Cluster examples on Kubernetes -## Bootstrap ### + +## Prerequisites + +Install kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/ + + + +## Bootstrap Bootstrap will grant access permission for Kubernetes APIs as well create configmap for environmet.xml configuration Depands on the Kubernetes environment configmap files may be different. Currently there is aws/configmap/hpcc for AWS environment and the other one is local/configmap/hpcc for local deployment. In bin directory ```sh # AWS -./bootstrap-aws.sh +bin/bootstrap-aws.sh # Local -./bootstrap-local.sh -or +bin/bootstrap-local.sh +or on Windows bootstrap.bat ``` User can modify security/cluster_role.yaml and files under configmap/hpcc -### Pod ### +## Pod Deploy HPCC Systems Platform on single node Reference [README.md](Pod/README.md) -### Deployment ### +## Deployment Deploy HPCC Systems cluster with Deployment Pod definition. Reference [README.md](Deployment/dp-1/README.md) -### StatefulSet ### +## RelicationController +Deploy HPCC Systems cluster with RelicationController Pod definition. +It is recommended to use "Deployment" instead + +## [StatefulSet](StatefulSet/README.md) Deploy HPCC Systems cluster with StatefulSet Pod definition. It includs ebs and nfs examples Reference . [EBS README.md](StatefulSet/ebs/ebs-1/README.md) . [EFS README.md](StatefulSet/efs/efs-1/README.md) -### istio ### +## istio Show some features of ISTIO on local Kubernetes environment Reference [README.md](istio/demo/README.md) -### elastic ### +## charts + +Helm Charts for HPCC Systems Cluster (Experimental) + +## local +Local Kubernetes setup instruction + +## security +RBAC settings for Kubernetes environment + +## aws +AWS related settings + +## elastic Filebeat, Metricbeat, etc example on local Kubernetes environment. Still in progress ... . diff --git a/RelicationController/rc-1/esp-e1-sf.yaml b/RelicationController/rc-1/esp-e1-sf.yaml deleted file mode 100644 index a87f9ae..0000000 --- a/RelicationController/rc-1/esp-e1-sf.yaml +++ /dev/null @@ -1,40 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: esp-e1 - namespace: default - labels: - app: esp-e1 -spec: - ports: - - port: 8010 - name: esp-e1 - clusterIP: None - selector: - name: esp-e1 ---- -apiVersion: apps/v1 -kind: StatefulSet -metadata: - name: esp-e1 -spec: # specification of the pod’s contents - serviceName: "esp-e1" - replicas: 2 - selector: - matchLabels: - app: esp-e1 - template: - metadata: - labels: - app: esp-e1 - spec: - containers: - - name: esp-e1 - image: "hpccsystems/platform:7.4.8-1" - ports: - - name: eclwatch - containerPort: 8010 - - name: wsecl - containerPort: 8010 - - name: roxie - containerPort: 9876 diff --git a/ReplicationController/rc-1/README.md b/ReplicationController/rc-1/README.md new file mode 100644 index 0000000..0d7c03a --- /dev/null +++ b/ReplicationController/rc-1/README.md @@ -0,0 +1,12 @@ +# Deploy Cluster with ReplicationController +"Deployment" is prefered + +## Start a Cluster +```console +./start +``` + +## Stop a Cluster +```console +./stop +``` \ No newline at end of file diff --git a/RelicationController/rc-1/admin.yaml b/ReplicationController/rc-1/admin.yaml similarity index 100% rename from RelicationController/rc-1/admin.yaml rename to ReplicationController/rc-1/admin.yaml diff --git a/RelicationController/rc-1/esp-e1.yaml b/ReplicationController/rc-1/esp-e1.yaml similarity index 100% rename from RelicationController/rc-1/esp-e1.yaml rename to ReplicationController/rc-1/esp-e1.yaml diff --git a/RelicationController/rc-1/roxie-r1.yaml b/ReplicationController/rc-1/roxie-r1.yaml similarity index 100% rename from RelicationController/rc-1/roxie-r1.yaml rename to ReplicationController/rc-1/roxie-r1.yaml diff --git a/RelicationController/rc-1/start b/ReplicationController/rc-1/start similarity index 100% rename from RelicationController/rc-1/start rename to ReplicationController/rc-1/start diff --git a/RelicationController/rc-1/stop b/ReplicationController/rc-1/stop similarity index 100% rename from RelicationController/rc-1/stop rename to ReplicationController/rc-1/stop diff --git a/RelicationController/rc-1/support.yaml b/ReplicationController/rc-1/support.yaml similarity index 100% rename from RelicationController/rc-1/support.yaml rename to ReplicationController/rc-1/support.yaml diff --git a/RelicationController/rc-1/thor-t1.yaml b/ReplicationController/rc-1/thor-t1.yaml similarity index 100% rename from RelicationController/rc-1/thor-t1.yaml rename to ReplicationController/rc-1/thor-t1.yaml diff --git a/RelicationController/rc-1/thormaster-t1.yaml b/ReplicationController/rc-1/thormaster-t1.yaml similarity index 100% rename from RelicationController/rc-1/thormaster-t1.yaml rename to ReplicationController/rc-1/thormaster-t1.yaml diff --git a/StatefulSet/README.md b/StatefulSet/README.md new file mode 100644 index 0000000..0c38726 --- /dev/null +++ b/StatefulSet/README.md @@ -0,0 +1,8 @@ +# Deploy HPCC Systems cluster with StatefulSet + +## Deployment Types + + - [EBS](ebs/ebs-1/README.md) + - [EFS](efs/efs-1/README.md) + +## Performance Comparison \ No newline at end of file diff --git a/StatefulSet/ebs/ebs-1/README.md b/StatefulSet/ebs/ebs-1/README.md index 8870d94..86f842c 100644 --- a/StatefulSet/ebs/ebs-1/README.md +++ b/StatefulSet/ebs/ebs-1/README.md @@ -1,33 +1,42 @@ -# Deploy Roxie/Thor Pods as StatefulSet/EBS +# Deploy Dali/Sasha/DropZone/Roxie/Thor Pods as StatefulSet/EBS -### Prerequisites ### -```sh -bin/bootstrap.sh +Current deployment has Sasha/DropZone in support Pod. + +StatefulSet gives capibility to dynamically to create EBS volume to attache to newly added Pod/Container. It uses volumeClaimTemplate and StorageClass to achieve this. +Delete VolumeClaim will automatically remove the volumes from EC2 otherwise user is responible to clean volumes in EC2. + +Compare EBS with EFS, EBS supports to have better performance but when cluster deployed cross multiple AZs it may be difficult to re-use these volumes. + + +## Prerequisities +```console +bin/bootstrap-aws.sh ``` -### Deploy HPCC Sysstems Cluster -```sh +## Deploy HPCC Systems Cluster +```console ./start ``` To make sure they are up: -```sh +```console kubectl get pods - -NAME READY STATUS RESTARTS AGE -esp-controller-bbgqu 1/1 Running 0 5m -esp-controller-wc8ae 1/1 Running 0 5m -master-controller-wa5z8 1/1 Running 0 5m -roxie-controller-hmvo5 1/1 Running 0 5m -roxie-controller-x7ksh 1/1 Running 0 5m -thor-controller-2sbe5 1/1 Running 0 5m -thor-controller-p1q7f 1/1 Running 0 5m -kubectl get pods - +NAME READY STATUS RESTARTS AGE +dali 1/1 Running 0 100s +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d14h +esp-esp1-69b59769bd-hmzss 1/1 Running 0 98s +hpcc-admin 1/1 Running 0 101s +roxie-roxie1-0 1/1 Running 0 97s +roxie-roxie1-1 1/1 Running 0 68s +roxie-roxie2-0 1/1 Running 0 96s +support-0 1/1 Running 0 99s +thor-thor1-0 1/1 Running 0 94s +thor-thor1-1 1/1 Running 0 74s +thormaster-thor1 1/1 Running 0 95s ``` The cluster should be automatically configured and started. To verify the status -```sh +```console bin/cluster_run.sh status Status of dali: @@ -70,9 +79,9 @@ thor1 ( pid 4208 ) is running with 2 slave process(es) ... ``` -### Access ECLWatch ### +## Access ECLWatch ### Get esp public ic: -```sh +```console kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE @@ -86,9 +95,9 @@ svc-thor-thor1 ClusterIP None ``` ECLWatch URL: http://a88781525d33911e9a3780efce698321-1790757551.us-east-1.elb.amazonaws.com:8010 -### Scale up/down ### +## Scale up/down ### Original roxie-roxie1 cluster has 2 instances. To increase it to 4 instances: -```sh +```console kubeclt scale --replicas 4 StatefulSet/roxie-roxie1 kubeclt get pods @@ -110,29 +119,29 @@ thormaster-thor1 1/1 Running 0 76m ``` To scale it back -```sh -kubeclt scale --replicas 4 StatefulSet/roxie-roxie1 +```console +kubeclt scale --replicas 2 StatefulSet/roxie-roxie1 ``` -### Stop/Start Cluster ### +## Stop/Start Cluster stop -```ah +```console bin/cluster-run stop ``` start -```ah +```console bin/cluster-run start ``` Get status -```sh +```console bin/cluster-run status ``` -### Delete Cluster ### -```sh +## Delete Cluster ### +```console ./stop ``` This does not delete volumes. Either use AWS Client or go to EC2 console to delete them. diff --git a/StatefulSet/efs/efs-1/README.md b/StatefulSet/efs/efs-1/README.md index 9cd53c0..ae3139f 100644 --- a/StatefulSet/efs/efs-1/README.md +++ b/StatefulSet/efs/efs-1/README.md @@ -1,3 +1,131 @@ -https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs -kubectl.sh apply -f rbac.yaml -kubectl.sh apply -f manifest.yaml +# Deploy Dali/Sasha/DropZone/Roxie/Thor Pods as StatefulSet/EFS + +Current deployment has Sasha/DropZone in support Pod. +Attach ReadWriteMany EFS to Pods doesn't need StatefulSet. See Deployment/efs/efs-1/README.md. But for ReadWriteOnce EFS StatefulSet is required which is this setup about. + + +## Prerequisities +- Bootstrap + ```console + bin/bootstrap-aws.sh + ``` +- Start NFS server + in efs/ + ```console + ./apply.sh + ``` + apply.sh appl rbac.yaml and manifest.yaml + To display NFS pod: + ```console + kubectl get pods + NAME READY STATUS RESTARTS AGE + efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h + ``` + To display PV and PVC: + ```console + kubectl get pv + NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE + pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX Delete Bound default/efs aws-efs 2d15h + + kubectl get pvc + NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE + efs Bound pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX aws-efs 2d15h + + The Volume Claim name is "efs". The storage class is "aws-efs" + +## Deploy HPCC Systems Cluster +```console +./start +``` +```console +kubectl get pods +NAME READY STATUS RESTARTS AGE +dali 1/1 Running 0 71s +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h +esp-esp1-f5bc48677-p5hgs 1/1 Running 0 69s +hpcc-admin 1/1 Running 0 72s +roxie-roxie1-0 1/1 Running 0 68s +roxie-roxie1-1 1/1 Running 0 58s +roxie-roxie2-0 1/1 Running 0 67s +support-0 1/1 Running 0 70s +thor-thor1-0 1/1 Running 0 64s +thor-thor1-1 1/1 Running 0 60s +thormaster-thor1 1/1 Running 0 66s +`` + +To get PersistentVolumeClaims: +```console +kubectl get pvc +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +efs Bound pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX aws-efs 2d15h +efs-roxie-roxie1-0 Bound pvc-78b19802-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m13s +efs-roxie-roxie1-1 Bound pvc-7e418789-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m3s +efs-roxie-roxie2-0 Bound pvc-794e58c9-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m12s +efs-support-0 Bound pvc-77771f35-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m15s +efs-thor-thor1-0 Bound pvc-7a852ef7-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m9s +efs-thor-thor1-1 Bound pvc-7d1062ef-da34-11e9-87ee-0e00576dcdfc 1Mi RWO aws-efs 7m5s +``` +To check clustr status: +```console +bin/cluster_run.sh status +``` +## Access EclWatch +Get esp public ip: +```console +kubectl get services +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +ew-esp1 LoadBalancer 10.100.244.14 a780e910ada3411e9b1b40aa0a5b6276-615333305.us-east-1.elb.amazonaws.com 8010:30414/TCP 4m38s +kubernetes ClusterIP 10.100.0.1 443/TCP 3d4h +svc-roxie-roxie1 ClusterIP None 4m37s +svc-roxie-roxie2 ClusterIP None 4m36s +svc-support ClusterIP None 4m39s +svc-thor-thor1 ClusterIP None 4m34s +``` + +To access EclWatch : http://a780e910ada3411e9b1b40aa0a5b6276-615333305.us-east-1.elb.amazonaws.com:8010 + +## Scale up/down +Scale up +```console +kubeclt scale --replicas 4 StatefulSet/roxie-roxie1 +NAME READY STATUS RESTARTS AGE +dali 1/1 Running 0 10m +efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h +esp-esp1-f5bc48677-p5hgs 1/1 Running 0 10m +hpcc-admin 1/1 Running 0 10m +roxie-roxie1-0 1/1 Running 0 10m +roxie-roxie1-1 1/1 Running 0 10m +roxie-roxie1-2 1/1 Running 0 12s +roxie-roxie1-3 0/1 Pending 0 5s +roxie-roxie2-0 1/1 Running 0 10m +support-0 1/1 Running 0 10m +thor-thor1-0 1/1 Running 0 10m +thor-thor1-1 1/1 Running 0 10m +thormaster-thor1 1/1 Running 0 10m +``` +Notice scale up EFS is much faster than EBS since no volume need be created. + +Scale down +```console +kubeclt scale --replicas 1 StatefulSet/roxie-roxie1 +``` +## Stop/Start Cluster +stop +```console +bin/cluster-run stop +``` +start +```console +bin/cluster-run start +``` + +Get status +```console +bin/cluster-run status + +``` + +## Delete Cluster ### +```console +./stop +``` diff --git a/aws/README.md b/aws/README.md index 2edc5b3..ae49d4e 100644 --- a/aws/README.md +++ b/aws/README.md @@ -1,223 +1,15 @@ -# HPCC-Kubernetes -## Reliable, Scaleble HPCC on Kubernetes +# AWS Settings for HPCC Systems Kubernetes -The following document describe the deployment of a reliable single node or cluster HPCC on Kubernetes. It uses HPCC 5.x docker images and still in experinment stage. We hope with HPCC 6.0.0 it can have a HPCC cluster with dali, esp, thor, roxie and other supporting componments on each individual controller. esp nodes will have a service for load-balance. -## Prerequisites +## [EKS](./EKS/README.md) +Elastic Kubernetes Service (EKS) -Install kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/ +## configmap +Properites files for generate environment.xml +## nfs +An old Replicationcontroller with NFS sample. Just keep it here for reference -## Deploy a single HPCC pod -### Turning up an HPCC Platform single node -A [_Pod_](https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pods.md) is one or more containers that _must_ be scheduled onto the same host. All containers in a pod share a network namespace, and may optionally share mounted volumes. +## volumes +gp2 storage-class is already defined in EKS environment -Here is the config for the hpcc platform pod: [hpcc.yaml](hpcc.yaml) - -Create HPCC Platfrom node as follow: -The current default hpcc pode use HPCC 5.4.8-1 on Ubuntu 14.04 amd64 trusty. You can change to other [HPCC docker images](https://hub.docker.com/r/hpccsystems/platform-ce/) or [build a HPCC docker image](https://github.com/xwang2713/HPCC-Docker) youself. -```sh -kubectl create -f hpcc.yaml -``` -For single node deployment HPCC is not started. you can start it as: -```sh -kubectl exec hpcc -- /etc/init.d/hpcc-init start -Starting mydafilesrv ... [ OK ] -Starting mydali ... [ OK ] -Starting mydfuserver ... [ OK ] -Starting myeclagent ... [ OK ] -Starting myeclccserver ... [ OK ] -Starting myeclscheduler ... [ OK ] -Starting myesp ... [ OK ] -Starting myroxie ... [ OK ] -Starting mysasha ... [ OK ] -Starting mythor ... [ OK ] -``` -You also can access the contain to run commands: -```sh -kubectl exec -i -t hpcc -- bash -il -``` -Type "exit" to exit it. - -Tt -exit - the HPCC node ip: -```sh -kubectl get pod hpcc -o json | grep podIP - "podIP": "172.17.0.2", -``` -or -```sh -kubectl describe pod hpcc | grep "IP:" -IP: 172.17.0.2 -``` -You can access ECLWatch from browser: ```hpcc://172.17.0.2:8010``` - -Pod ip (172.17.0.2) is private. If can't reach it you can try ssh tunnel to the host Linux: -```sh -ssh -L 8010:172.17.0.2:8010 @ -``` -Now you can access ECLWatch from your local broswer: ```hpcc://localhost:8010``` - -### Stop and delete the HPCC pod -```sh -kubectl delete -f hpcc.yaml -``` - -## Deploy a HPCC Cluster with Kubernetes Controller -In Kubernetes a [_Replication Controller_](../../docs/user-guide/replication-controller.md) is responsible for replicating sets of identical pods. Like a _Service_ it has a selector query which identifies the members of it's set. Unlike a _Service_ it also has a desired number of replicas, and it will create or delete _Pods_ to ensure that the number of _Pods_ matches up with it's desired state. - -Replication Controllers will "adopt" existing pods that match their selector query, so let's create a Replication Controller with a single replica to adopt our existing Redis server. Here are current HPCC the replication controller config: [master-controller.yaml](master-controller.yaml), [thor-controller.yaml](thor-controller.yaml), [roxie-controller.yaml](roxie-controller.yaml). [esp-controller.yaml](esp-controller.yaml). In future we want to further divid master configuration to dali, sasha and rest support components. - - -###Turn up thor instances -```sh -kubectl create -f thor-controller.yaml -``` -The default thor-controller define two thor slaves. -To make sure they are up: -```sh -kubectl get rc thor-controller -NAME DESIRED CURRENT AGE -thor-controller 2 2 1m -``` - -### Turn up roxie instances -```sh -kubectl create -f roxie-controller.yaml -``` -The default roxie-controller define two roxie instance. -To make sure they are up: -```sh -kubectl get rc roxie-controller -NAME DESIRED CURRENT AGE -roxie-controller 2 2 2m -``` - -### Turn up esp instances -```sh -kubectl create -f esp-controller.yaml -``` -The default esp-controller define two roxie instance. -To make sure they are up: -```sh -kubectl get rc esp-controller -NAME DESIRED CURRENT AGE -esp-controller 2 2 2m -``` - -### Turn up master instance -The master instance includs HPCC support components. It should be started after thor and roxie are up and ready. It will collect all ips ,configure and start the cluster. -To verify the thor and roxie are ready: -```sh -kubectl get pods -NAME READY STATUS RESTARTS AGE -esp-controller-bbgqu 1/1 Running 0 3m -esp-controller-wc8ae 1/1 Running 0 3m -roxie-controller-hmvo5 1/1 Running 0 3m -roxie-controller-x7ksh 1/1 Running 0 3m -thor-controller-2sbe5 1/1 Running 0 3m -thor-controller-p1q7f 1/1 Running 0 3m -``` -To start master instance: -```sh -kubectl create -f master-controller.yaml -``` -Make sure it is up and ready: -```sh -kubectl get rc master-controller -NAME DESIRED CURRENT AGE -master-controller 1 1 12h - -kubectl get pods -NAME READY STATUS RESTARTS AGE -esp-controller-bbgqu 1/1 Running 0 5m -esp-controller-wc8ae 1/1 Running 0 5m -master-controller-wa5z8 1/1 Running 0 5m -roxie-controller-hmvo5 1/1 Running 0 5m -roxie-controller-x7ksh 1/1 Running 0 5m -thor-controller-2sbe5 1/1 Running 0 5m -thor-controller-p1q7f 1/1 Running 0 5m - - -### Access ECLWatch and Verify the cluster -Get mastr ip: -```sh -kubectl get pod master-controller-ar6jn -o json | grep podIP - "podIP": "172.17.0.5", -``` -If everything run OK you should access ECLWatch to verify the configuration: ```http://172.17.0.5:8010```. Again if you can't access the private ip you can try to tunnel it above described in deploy single HPCC instance. - -If something go wrong you can access the master instance: -```sh -kubectl exec master-controller-ar6jn -i -t -- bash -il -``` -configuration scripts, log ile and outputs are under /tmp/ - - -### Start a load balancer on esp -When deploy Kubernetes on a cloud such as AWS you can create load balancer for esp -```sh -kubectl create -f esp-service.yaml -``` -Make sure the service is up -```sh -kubectl get service -NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE -esp 10.0.21.220 a2c49b2864c79... 8001/TCP 3h -kubernetes 10.0.0.1 443/TCP 3d -``` - -The "EXTERNAL-IP" is too long. -```sh -kubectl get service -o json | grep a2c49b2864c79 -"hostname": "a2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" -``` -2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" and we define the port 8001. so 2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com:8001 should display eclwatch - - - - -### Scale thor and roxie replicated pods -For example, to add one more thor and make total 3 thor slaves: -```sh -kubectl scale rc thor --replicas=3 -``` - -```Note```: we need more tests on this area, particularly need restart /tmp/run_master.sh to allow re-collect pod ips, generate new environment.xml and stop/start HPCC cluster. - -### Stop and delete HPCC cluster -```sh -kubectl delete -f esp-service.yaml -kubectl delete -f thor-controller.yaml -kubectl delete -f roxie-controller.yaml -kubectl delete -f esp-controller.yaml -kubectl delete -f master-controller.yaml -``` - -### Known issues -1. Even create thor containers but thor slaves will try to deployed from the first non-master containers instead of the first thor container. This probably can be fixed by entries in genrules.conf or need wait for HPCC 6.0.0. - -2. Roxie fails to start in cluster environenment. This is no WMEM_MAX and RMEM_MAX resources in the container environenment. These buffer size setting should be configured on the host system. In HPCC 6.0.0 we will skip the checking on containers and document this. We do need to test network performance and give some guidanse for the buffer size setting on the host. - - - -### New Instruction ### -1. create configmap: - From configmap directory - kubectl.sh create configmap hpcc-config --from-file=hpcc/ - kubectl.sh get configmap - -2. grant access permission: - From security directory - kubectl.sh apply -f (resource).yml - for get_pods.py need: kubectl.sh apply -f cluster_role.yaml - -3. Start pods - ./start - -4. Stop pods - ./stop - -5. ssh to pod - kubectl exec -i -t -- bash -il diff --git a/bin/cluster_run.ps1 b/bin/cluster_run.ps1 index b2316a0..a27495e 100644 --- a/bin/cluster_run.ps1 +++ b/bin/cluster_run.ps1 @@ -11,14 +11,14 @@ Get status/start/stop HPCC Systems Cluster .Example ./cluster_run.ps1 -action status -.NOTES - +. .LINK https://github.com/xwang2713/HPCC-Kubernetes #> param( + $admin_pod="hpcc-admin", $namespace="default", $component="", $pod_name="", @@ -27,9 +27,9 @@ param( $wkDir = split-path $myInvocation.MyCommand.path -$KUBECTL="kubectl.exe" +$KUBECTL = "kubectl.exe" cd $wkDir -function get_cluster_status +function get_cluster_status() { foreach ( $pod in (./cluster_query.ps1)) { @@ -38,56 +38,42 @@ function get_cluster_status } } -function runHPCC +function runHPCC ($a) { if ( "$pod_name" -eq "" ) { return 1 } - $cmd="${KUBECTL} exec $pod_name /etc/init.d/hpcc-init" + $cmd = "${KUBECTL} exec $pod_name /etc/init.d/hpcc-init" if ( $comp_name ) { - $cmd="$cmd -c $comp_name" + $cmd = "$cmd -c $comp_name" } - $cmd="$cmd $1" + $cmd = "$cmd $a" "$cmd" iex "$cmd" } -function runHPCCCluster +function runHPCCCluster ( $a ) { - @" - -############################################### -# -# $1 HPCC Cluster ... -# -############################################### - "@ - $cmd="${KUBECTL} exec $admin_pod /opt/hpcc-tools/$1_hpcc.sh" + + $cmd = "${KUBECTL} exec $admin_pod /opt/hpcc-tools/${a}_hpcc.sh" "$cmd" iex "$cmd" - @" -############################################### -# -# Status: -# -############################################### -"@ - get_cluster_status + get_cluster_status } -function stxxxHPCC +function stxxHPCC ($a) { if ( $action -ieq "restart" -or ! $pod_name ) { - runHPCCCluster $1 + runHPCCCluster $a return } - runHPCC $1 + runHPCC $a } @@ -96,12 +82,38 @@ switch ( $action ) { "status" { - get_clster_status + if (! $pod_name) + { + get_cluster_status + } + else + { + runHPCC $action + } + } "start" { - "start" + if ( $component -ieq "configmgr") + { + kubectl.exe exec -it $admin_pod /opt/HPCCSystems/sbin/configmgr + } + else + { + stxxHPCC "start" + } + } + "stop" + { + stxxHPCC "stop" } - "stop" {"stop"} - "restart" { "restart"} + "restart" + { + stxxHPCC "stop" + stxxHPCC "start" + } + "*" + { + "Unknown action $action" + } } diff --git a/local/APPLE.md b/local/APPLE.md deleted file mode 100644 index e69de29..0000000 diff --git a/local/LINUX.md b/local/LINUX.md index e69de29..21169f5 100644 --- a/local/LINUX.md +++ b/local/LINUX.md @@ -0,0 +1,12 @@ +# Getting started locally on Linux + +### Brief Instructions +- Install Docker CE https://docs.docker.com/install/ +- Install Go https://golang.org/doc/install +- git clone --depth=1 https://github.com/kubernetes/kubernetes.git +- cd to Kubernetes and run as root: + ```console + ./hack/local-up-cluster.sh + ``` +### Reference +https://github.com/kubernetes/community/blob/master/contributors/devel/running-locally.md diff --git a/local/MACOS.md b/local/MACOS.md new file mode 100644 index 0000000..a916718 --- /dev/null +++ b/local/MACOS.md @@ -0,0 +1,10 @@ +# Getting started locally on MacOS + +### Brief Instructions +- Install Docker CE https://docs.docker.com/install/ +- Install Go https://golang.org/doc/install +- git clone --depth=1 https://github.com/kubernetes/kubernetes.git +- cd to Kubernetes and run as root: ./hack/local-up-cluster.sh + +### Reference +https://github.com/kubernetes/community/blob/master/contributors/devel/running-locally.md diff --git a/local/MINIKUBE.md b/local/MINIKUBE.md index 304c15e..e2e7757 100644 --- a/local/MINIKUBE.md +++ b/local/MINIKUBE.md @@ -5,24 +5,20 @@ Installation: https://kubernetes.io/docs/tasks/tools/install-minikube/ Start minikube - 1) minikube start -p minikube - 2) change setting: - 2.1) minikube stop -p minikube - 2.2) minikube.exe --vm-driver=virtualbox --cpus 4 --disk-size 100g --memory 8192 start -p minikube + 1. minikube start -p minikube + 2. change setting: + 1. minikube stop -p minikube + 2. minikube.exe --vm-driver=virtualbox --cpus 4 --disk-size 100g --memory 8192 start -p minikube ## bootstrap This will setup ConfigMap and grant security permissions ```sh cd bin/ -./bootstrap.sh +./bootstrap-local.sh +# or on Windows +./bootstrap.bat ``` ## Start and Stop Cluster All following assume in directory of github project HPCC-Kubernetes. -The Kubernetes client wraper is "kubectl.sh" on Unix and "kubectl.exe" on Windows. In some providers it can be just named "kubectl". "kubectl.sh" is used here. - -Following Development are supported on Minikube - -- simple-rc -- simple-dp -- elastic +The Kubernetes client wraper is "kubectl.sh" on Unix and "kubectl.exe" on Windows. In some providers it can be just named "kubectl". "kubectl.sh" is used here. \ No newline at end of file diff --git a/local/README.md b/local/README.md index 068107c..9979e0b 100644 --- a/local/README.md +++ b/local/README.md @@ -1,27 +1,12 @@ ## Setup local Kubernetes environments -. Kubernetes on local Linux -. Minikube - -## Install Kubernetes on local Linux - -This document assumes that you have a Kubernetes cluster installed and running, and that you have installed the ```kubectl``` command line tool somewhere in your path. Please see https://github.com/kubernetes/kubernetes for installation instructions for your platform. We currenly only test on local Linux setup (should replace following '''kubectl''' with '''cluster/kubectl.sh''' in kubernetes package directory) and will test on AWS soon. - - -### Start Kubernetes -as root user run: -```console -hack/local-up-cluster.sh -``` - -## Install Minikube on Linux - -## Install Minikube on Windows +- Kubernetes on local [Linux](LINUX.md)/[MacOS](MACOS.md)/[Windows](WINDOWS.md) +- [Minikube](MINIKUBE.md) ## Tested deployments on local Kubernetes: -. Pod -. Deployment/dp-1 -. local/hpcc_dns -. istio -. elastic +- Pod +- Deployment/dp-1 +- local/hpcc_dns +- istio +- elastic diff --git a/local/WINDOWS.md b/local/WINDOWS.md index e69de29..b5cfa82 100644 --- a/local/WINDOWS.md +++ b/local/WINDOWS.md @@ -0,0 +1 @@ +Wait for WSL2 release \ No newline at end of file diff --git a/security/README b/security/README.md similarity index 100% rename from security/README rename to security/README.md