-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
27 changed files
with
587 additions
and
448 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,97 +1,110 @@ | ||
# HPCC-Kubernetes | ||
|
||
## Deploy a HPCC Cluster with Kubernetes Deployment | ||
In Kubernetes a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) is responsible for replicating sets of identical pods. Like a _Service_ it has a selector query which identifies the members of it's set. Unlike a _Service_ it also has a desired number of replicas, and it will create or delete _Pods_ to ensure that the number of _Pods_ matches up with it's desired state. | ||
|
||
Make sure bin/bootstrap.[sh|bat] started first | ||
|
||
```sh | ||
# Deploy HPCC Systems Cluster with Deployment | ||
|
||
This is a simple stateless deployment scenario. It can be used to both local and real cloud, such as AWS. | ||
|
||
## Prerequisities | ||
- Bootstrap | ||
AWS: | ||
```console | ||
bin/bootstrap-aws.sh | ||
``` | ||
Local: | ||
```console | ||
bin/bootstrap-local.sh | ||
``` | ||
## Deploy HPCC Systems Cluster | ||
```console | ||
./start | ||
``` | ||
To verify the thor and roxie are ready: | ||
```sh | ||
To make sure they are up: | ||
```console | ||
kubectl get pods | ||
|
||
NAME READY STATUS RESTARTS AGE | ||
esp-controller-bbgqu 1/1 Running 0 3m | ||
esp-controller-wc8ae 1/1 Running 0 3m | ||
roxie-controller-hmvo5 1/1 Running 0 3m | ||
roxie-controller-x7ksh 1/1 Running 0 3m | ||
thor-controller-2sbe5 1/1 Running 0 3m | ||
thor-controller-p1q7f 1/1 Running 0 3m | ||
NAME READY STATUS RESTARTS AGE | ||
efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d16h | ||
esp-esp1-69b59769bd-94gm4 1/1 Running 0 16s | ||
hpcc-admin 1/1 Running 0 19s | ||
roxie-roxie1-64d49d76cf-b28gh 1/1 Running 0 15s | ||
support-778c8ffbb-p44t7 1/1 Running 0 17s | ||
thor-thor1-75bb466cbf-skqj5 1/1 Running 0 13s | ||
thormaster-thor1 1/1 Running 0 14s | ||
``` | ||
To start master instance: | ||
```sh | ||
kubectl create -f master-controller.yaml | ||
The cluster should be automatically configured and started. | ||
To verify the status | ||
```console | ||
bin/cluster_run.sh status | ||
Status of esp-esp1-69b59769bd-94gm4: | ||
mydafilesrv ( pid 981 ) is running ... | ||
esp1 ( pid 1175 ) is running ... | ||
|
||
Status of roxie-roxie1-64d49d76cf-b28gh: | ||
mydafilesrv ( pid 969 ) is running ... | ||
roxie1 ( pid 1168 ) is running ... | ||
|
||
Status of support-778c8ffbb-p44t7: | ||
mydafilesrv ( pid 1006 ) is running ... | ||
mydali ( pid 1200 ) is running ... | ||
mydfuserver ( pid 1413 ) is running ... | ||
myeclagent ( pid 1629 ) is running ... | ||
myeclccserver ( pid 1832 ) is running ... | ||
myeclscheduler ( pid 2049 ) is running ... | ||
mysasha ( pid 2255 ) is running ... | ||
|
||
Status of thor-thor1-75bb466cbf-skqj5: | ||
mydafilesrv ( pid 962 ) is running ... | ||
|
||
Status of thormaster-thor1: | ||
mydafilesrv ( pid 969 ) is running ... | ||
thor1 ( pid 1214 ) is running with 1 slave process(es) ... | ||
`` | ||
|
||
## Scale up/down | ||
Original roxie-roxie1 cluster has 1 instances. To increase it to 4 instances: | ||
```console | ||
kubeclt scale --replicas 2 StatefulSet/roxie-roxie1 | ||
NAME READY STATUS RESTARTS AGE | ||
efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d16h | ||
esp-esp1-69b59769bd-94gm4 1/1 Running 0 3m15s | ||
hpcc-admin 1/1 Running 0 3m18s | ||
roxie-roxie1-64d49d76cf-b28gh 1/1 Running 0 3m14s | ||
roxie-roxie1-64d49d76cf-mflqn 1/1 Running 0 11s | ||
support-778c8ffbb-p44t7 1/1 Running 0 3m16s | ||
thor-thor1-75bb466cbf-skqj5 1/1 Running 0 3m12s | ||
thormaster-thor1 1/1 Running 0 3m13s | ||
``` | ||
Make sure it is up and ready: | ||
```sh | ||
kubectl get rc master-controller | ||
NAME DESIRED CURRENT AGE | ||
master-controller 1 1 12h | ||
|
||
kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
esp-controller-bbgqu 1/1 Running 0 5m | ||
esp-controller-wc8ae 1/1 Running 0 5m | ||
master-controller-wa5z8 1/1 Running 0 5m | ||
roxie-controller-hmvo5 1/1 Running 0 5m | ||
roxie-controller-x7ksh 1/1 Running 0 5m | ||
thor-controller-2sbe5 1/1 Running 0 5m | ||
thor-controller-p1q7f 1/1 Running 0 5m | ||
|
||
|
||
### Access ECLWatch and Verify the cluster | ||
Get mastr ip: | ||
```sh | ||
kubectl get pod master-controller-ar6jn -o json | grep podIP | ||
"podIP": "172.17.0.5", | ||
To scale it back | ||
```console | ||
kubeclt scale --replicas 1 Deployment/roxie-roxie1 | ||
``` | ||
If everything run OK you should access ECLWatch to verify the configuration: ```http://172.17.0.5:8010```. Again if you can't access the private ip you can try to tunnel it above described in deploy single HPCC instance. | ||
|
||
If something go wrong you can access the master instance: | ||
```sh | ||
kubectl exec master-controller-ar6jn -i -t -- bash -il | ||
## Auto-scaling | ||
A sample autoscaling yaml file is provided. You can modify it and apply it | ||
```console | ||
kubectl apply -f esp-e1-hpa.yaml | ||
``` | ||
configuration scripts, log ile and outputs are under /tmp/ | ||
Increase esp Pod cpu, for example run a big loop and monitor the auto-scaling. | ||
|
||
### Start a load balancer on esp | ||
When deploy Kubernetes on a cloud such as AWS you can create load balancer for esp | ||
```sh | ||
kubectl create -f esp-service.yaml | ||
``` | ||
Make sure the service is up | ||
```sh | ||
kubectl get service | ||
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
esp 10.0.21.220 a2c49b2864c79... 8001/TCP 3h | ||
kubernetes 10.0.0.1 <none> 443/TCP 3d | ||
The disable auto-scaling: | ||
```console | ||
kubectl delete -f esp-e1-hpa.yaml | ||
``` | ||
|
||
The "EXTERNAL-IP" is too long. | ||
```sh | ||
kubectl get service -o json | grep a2c49b2864c79 | ||
"hostname": "a2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" | ||
## Stop/Start Cluster | ||
stop | ||
```console | ||
bin/cluster-run stop | ||
``` | ||
2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com" and we define the port 8001. so 2c49b2864c7911e6ab6506c30bb0563-401114081.eu-west-1.elb.amazonaws.com:8001 should display eclwatch | ||
### Scale thor and roxie replicated pods | ||
For example, to add one more thor and make total 3 thor slaves: | ||
```sh | ||
kubectl scale rc thor --replicas=3 | ||
start | ||
```console | ||
bin/cluster-run start | ||
``` | ||
|
||
```Note```: we need more tests on this area, particularly need restart /tmp/run_master.sh to allow re-collect pod ips, generate new environment.xml and stop/start HPCC cluster. | ||
Get status | ||
```console | ||
bin/cluster-run status | ||
|
||
### Stop and delete HPCC cluster | ||
```sh | ||
kubectl delete -f esp-service.yaml | ||
kubectl delete -f thor-controller.yaml | ||
kubectl delete -f roxie-controller.yaml | ||
kubectl delete -f esp-controller.yaml | ||
kubectl delete -f master-controller.yaml | ||
``` | ||
|
||
## Delete Cluster ### | ||
```console | ||
./stop | ||
``` |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
# Deployment with EBS | ||
Only one PersistentVolumeClaim created per Deployment yaml file | ||
|
||
Need provide each PersistentVolumeClaim in Pod. Can't dynamically creat volume and attach to scale-up Pod automatically. Use StatefulSet instead unless there are some methods we are not aware. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
# Deploy Dali/Sasha/DropZone/Roxie/Thor Pods as Deployment/EFS | ||
|
||
Generally this is prefer way to deploy cluster with EFS since typically share mode is ReadWriteMany. | ||
|
||
Current deployment has Sasha/DropZone in support Pod. | ||
|
||
Even EFS performance may not be good as EBS but EFS it is very convenient such as: | ||
- Don't need worry about cross AZz | ||
- Easy to share and re-use data | ||
- Don't need to worry to delete volume after deleting Pod. | ||
|
||
EFS is little expensive than EBS. | ||
|
||
## Performance | ||
to do (compare EFS and EBS) | ||
|
||
## Prerequisities | ||
- Bootstrap | ||
```console | ||
bin/bootstrap-aws.sh | ||
``` | ||
- Start NFS server | ||
in efs/ | ||
```console | ||
./apply.sh | ||
``` | ||
apply.sh appl rbac.yaml and manifest.yaml | ||
To display NFS pod: | ||
```console | ||
kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h | ||
``` | ||
To display PV and PVC: | ||
```console | ||
kubectl get pv | ||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE | ||
pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX Delete Bound default/efs aws-efs 2d15h | ||
|
||
kubectl get pvc | ||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE | ||
efs Bound pvc-bdf95dd2-d820-11e9-87ee-0e00576dcdfc 1Mi RWX aws-efs 2d15h | ||
|
||
The Volume Claim name is "efs". The storage class is "aws-efs" | ||
|
||
## Deploy HPCC Systems Cluster | ||
```console | ||
./start | ||
``` | ||
To make sure they are up: | ||
```console | ||
kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
dali 1/1 Running 0 51s | ||
efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h | ||
esp-esp1-f5bc48677-znlsv 1/1 Running 0 49s | ||
hpcc-admin 1/1 Running 0 52s | ||
roxie-roxie1-84f9578895-fbpld 1/1 Running 0 47s | ||
roxie-roxie1-84f9578895-gpf6x 1/1 Running 0 47s | ||
roxie-roxie2-6cf55ffd45-nf2mp 1/1 Running 0 46s | ||
support-db468c5c9-8kqzd 1/1 Running 0 50s | ||
thor-thor1-59876665f5-67p4p 1/1 Running 0 44s | ||
thor-thor1-59876665f5-nn7wc 1/1 Running 0 44s | ||
thormaster-thor1 1/1 Running 0 45s | ||
``` | ||
|
||
The cluster should be automatically configured and started. | ||
To verify the status | ||
```console | ||
bin/cluster_run.sh status | ||
Status of dali: | ||
mydafilesrv ( pid 972 ) is running ... | ||
mydali ( pid 1166 ) is running ... | ||
|
||
Status of esp-esp1-f5bc48677-znlsv: | ||
mydafilesrv ( pid 991 ) is running ... | ||
esp1 ( pid 1185 ) is running ... | ||
|
||
Status of roxie-roxie1-84f9578895-fbpld: | ||
mydafilesrv ( pid 978 ) is running ... | ||
roxie1 ( pid 1177 ) is running ... | ||
|
||
Status of roxie-roxie1-84f9578895-gpf6x: | ||
mydafilesrv ( pid 978 ) is running ... | ||
roxie1 ( pid 1177 ) is running ... | ||
|
||
Status of roxie-roxie2-6cf55ffd45-nf2mp: | ||
mydafilesrv ( pid 979 ) is running ... | ||
roxie2 ( pid 1178 ) is running ... | ||
|
||
Status of support-db468c5c9-8kqzd: | ||
mydafilesrv ( pid 1010 ) is running ... | ||
mydfuserver ( pid 1204 ) is running ... | ||
myeclagent ( pid 1413 ) is running ... | ||
myeclccserver ( pid 1633 ) is running ... | ||
myeclscheduler ( pid 1851 ) is running ... | ||
mysasha ( pid 2059 ) is running ... | ||
|
||
Status of thor-thor1-59876665f5-67p4p: | ||
mydafilesrv ( pid 972 ) is running ... | ||
|
||
Status of thor-thor1-59876665f5-nn7wc: | ||
mydafilesrv ( pid 972 ) is running ... | ||
|
||
Status of thormaster-thor1: | ||
mydafilesrv ( pid 978 ) is running ... | ||
thor1 ( pid 1243 ) is running with 2 slave process(es) ... | ||
|
||
``` | ||
|
||
|
||
## Access ECLWatch | ||
Get esp public ip: | ||
```console | ||
kubectl get service | ||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
ew-esp1 LoadBalancer 10.100.248.99 a0e9629f2da3811e9b1b40aa0a5b6276-2050531242.us-east-1.elb.amazonaws.com 8010:30108/TCP 3m52s | ||
|
||
``` | ||
ECLWatch URL: http://a0e9629f2da3811e9b1b40aa0a5b6276-2050531242.us-east-1.elb.amazonaws.com:8010 | ||
|
||
## Scale up/down | ||
Original roxie-roxie1 cluster has 2 instances. To increase it to 4 instances: | ||
```console | ||
kubeclt scale --replicas 6 StatefulSet/roxie-roxie1 | ||
|
||
kubeclt get pods | ||
NAME READY STATUS RESTARTS AGE | ||
dali 1/1 Running 0 6m10s | ||
efs-provisioner-57965c4946-7w4b5 1/1 Running 0 2d15h | ||
esp-esp1-f5bc48677-znlsv 1/1 Running 0 6m8s | ||
hpcc-admin 1/1 Running 0 6m11s | ||
roxie-roxie1-84f9578895-7p4qz 1/1 Running 0 29s | ||
roxie-roxie1-84f9578895-8tpxp 1/1 Running 0 29s | ||
roxie-roxie1-84f9578895-fbpld 1/1 Running 0 6m6s | ||
roxie-roxie1-84f9578895-gpf6x 1/1 Running 0 6m6s | ||
roxie-roxie1-84f9578895-tjlp9 1/1 Running 0 29s | ||
roxie-roxie1-84f9578895-v62dj 1/1 Running 0 29s | ||
roxie-roxie2-6cf55ffd45-nf2mp 1/1 Running 0 6m5s | ||
support-db468c5c9-8kqzd 1/1 Running 0 6m9s | ||
thor-thor1-59876665f5-67p4p 1/1 Running 0 6m3s | ||
thor-thor1-59876665f5-nn7wc 1/1 Running 0 6m3s | ||
thormaster-thor1 1/1 Running 0 6m4s | ||
|
||
``` | ||
To scale it back | ||
```console | ||
kubeclt scale --replicas 2 Deployment/roxie-roxie1 | ||
``` | ||
|
||
|
||
## Stop/Start Cluster | ||
stop | ||
```console | ||
bin/cluster-run stop | ||
``` | ||
start | ||
```console | ||
bin/cluster-run start | ||
``` | ||
|
||
Get status | ||
```console | ||
bin/cluster-run status | ||
|
||
``` | ||
|
||
## Delete Cluster ### | ||
```console | ||
./stop | ||
``` | ||
This does not delete volumes. Either use AWS Client or go to EC2 console to delete them. | ||
|
||
|
Oops, something went wrong.