Description
StackStorm HA enhancements
### st2sensorcontainer
* Improve reliability of st2 sensors by having a native way for a node to take over a sensor if
an owner fails. (Sensor replicas with partition map makes sense?)
### st2rulesengine
* Currently, the timers run in st2rulesengine and is not HA compatible. We should fix this.
- An option is to view st2timersengine as a separate process and offload responsibility
of uptime to kubernetes
### st2actionrunner
* Policies require HA redis/zookeeper for coordination. Verify with redis.
### st2resultstracker
* Verify you can spin multiple instances with new callback design for mistral
* Update documentation https://docs.stackstorm.com/reference/ha.html#st2resultstracker
### st2notifier
* Documentation is confusing - https://docs.stackstorm.com/reference/ha.html#st2resultstracker
* Verify we can spin more than one st2notifier without Redis/ZK for coordination
- What happens when we don't have coordination service?
### Common
* We should do chaos monkey testing by hupping some processes and see how things react
* We should figure out new version deployments - Blue/Green or rolling?
- I propose blue/green for Kubernetes based deployments but this maybe hard to do for non-kubernetes deployments
* We should figure out how to upgrade packs
- I have no idea how we are going to do this
StackStorm scaling
CustomerX wants a peak of 1200 concurrent automation (we don't know if they mean individual actions
or workflows). They want to be able to run 20000 automations per day which is around 14 executions/min.
I am pretty sure we can 14 executions/min but we definitely won't be able to do 1200 concurrent
mistral workflows. We should test this.
StackStorm HA in Docker
-
Right now our image size is 390M - That is way too large for us to pull and deploy.
- Should we build a package of st2 that works with apk package manager in alpine linux?
- Do we need this or can we just git clone the source, install the pip dependencies?
- We don't need dh-virtualenv because we are now inside Docker?
- Can we quickly figure out the image size?
- Should we build a package of st2 that works with apk package manager in alpine linux?
-
Helm
- Leave this to community but make sure we have a helm chart
Helm Chart st2-docker#126
- Leave this to community but make sure we have a helm chart
StackStorm HA Deployment
-
Ansible playbooks for HA deployment on bare metal/VMs
- Not much to think here other than building playbooks for one reference OS (Ubuntu 18.04)
- BTW we should build OS packages for Ubuntu 18.04
- We have to think about secret configuration entries (in st2.conf and packs)
- At a minimum, we should have a way to deploy st2.conf with secrets
- Not much to think here other than building playbooks for one reference OS (Ubuntu 18.04)
-
Kubernetes story for deployment on cloud varies based on provider. We should also account for
on-prem kubernetes deployments. We should figure out which cloud providers we want to address
directly and for which ones we rely on community (should we rely on community?)Common
-
What kind of OS we want to run on top of?
- Evaluate container OSes like CoreOS, RancherOS, DC/OS, Project Atomic, Ubuntu Snappy, ...
-
Decide if we really need a container host OS or should we just run on Ubuntu (We shouldn't really try to support multiple Host OSes)
- We can use the same OS in both cloud providers and on-prem to control experience and support
- Automated over-the-air updates
- Yet another technology
https://www.inovex.de/blog/docker-a-comparison-of-minimalistic-operating-systems/
-
I am leaning towards CoreOS because of the ability to spin instances in any cloud
and etcd is natively available for service discovery. It would be a hard sell to ask
Enterprises to run CoreOS in their data centers. Some probing here would be good.
-
- Evaluate container OSes like CoreOS, RancherOS, DC/OS, Project Atomic, Ubuntu Snappy, ...
-
How are going to manage configurations?
- This is a tricky one
- People typically use environment variables.
- People also try to use etcd/consul and use confd inside container https://github.com/kelseyhightower/confd
- Amazon parameter store
- This is a tricky one
-
How are going to manage secrets?
- AWS secrets store, vault (see aws-vault), https://kubernetes.io/docs/concepts/configuration/secret/https://kubernetes.io/docs/concepts/configuration/secret/
https://lyft.github.io/confidant/,
- AWS secrets store, vault (see aws-vault), https://kubernetes.io/docs/concepts/configuration/secret/https://kubernetes.io/docs/concepts/configuration/secret/
-
K8s in various clouds/on-prem
AWS
- When EKS (Hosted Kubernetes) is out, I think people would prefer to use that. I hope it is integrated
with Amazon secret store and Amazon Parameter store.
- If EKS isn't an option, we should look into kops for kubernetes deployment and look at solving secrets and config ourselves
- ECS is not an option (Since colocation would be a problem)
- Read https://news.ycombinator.com/item?id=15808065 (Especially comments)
GCP
- GCE (Managed Kunernets, most attractive, some advanced customization not possible)
- kops works with GCP
Azure
- ACS
- No kops https://github.com/kubernetes/kops/issues/3957
- Should we even do this?
On-prem
- Kubespray (uses ansible under the hood - https://kubernetes.io/docs/getting-started-guides/kubespray/)
- We should definitely leave this to community and see if there are any takers for kubespray