Skip to content

StackStorm HA, scaling, k8s #16

Open
@lakshmi-kannan

Description

@lakshmi-kannan

StackStorm HA enhancements

### st2sensorcontainer

* Improve reliability of st2 sensors by having a native way for a node to take over a sensor if
  an owner fails. (Sensor replicas with partition map makes sense?)

### st2rulesengine

* Currently, the timers run in st2rulesengine and is not HA compatible. We should fix this.
    - An option is to view st2timersengine as a separate process and offload responsibility
    of uptime to kubernetes

### st2actionrunner

* Policies require HA redis/zookeeper for coordination. Verify with redis.

### st2resultstracker

* Verify you can spin multiple instances with new callback design for mistral
* Update documentation https://docs.stackstorm.com/reference/ha.html#st2resultstracker

### st2notifier

* Documentation is confusing - https://docs.stackstorm.com/reference/ha.html#st2resultstracker
* Verify we can spin more than one st2notifier without Redis/ZK for coordination
    - What happens when we don't have coordination service?

### Common

* We should do chaos monkey testing by hupping some processes and see how things react

* We should figure out new version deployments - Blue/Green or rolling?
    - I propose blue/green for Kubernetes based deployments but this maybe hard to do for non-kubernetes deployments

* We should figure out how to upgrade packs
    - I have no idea how we are going to do this

StackStorm scaling

CustomerX wants a peak of 1200 concurrent automation (we don't know if they mean individual actions
or workflows). They want to be able to run 20000 automations per day which is around 14 executions/min.
I am pretty sure we can 14 executions/min but we definitely won't be able to do 1200 concurrent
mistral workflows. We should test this.

StackStorm HA in Docker

  • Right now our image size is 390M - That is way too large for us to pull and deploy.

    • Should we build a package of st2 that works with apk package manager in alpine linux?
      • Do we need this or can we just git clone the source, install the pip dependencies?
      • We don't need dh-virtualenv because we are now inside Docker?
      • Can we quickly figure out the image size?
  • Helm

StackStorm HA Deployment

  • Ansible playbooks for HA deployment on bare metal/VMs

    • Not much to think here other than building playbooks for one reference OS (Ubuntu 18.04)
      • BTW we should build OS packages for Ubuntu 18.04
    • We have to think about secret configuration entries (in st2.conf and packs)
      • At a minimum, we should have a way to deploy st2.conf with secrets
  • Kubernetes story for deployment on cloud varies based on provider. We should also account for
    on-prem kubernetes deployments. We should figure out which cloud providers we want to address
    directly and for which ones we rely on community (should we rely on community?)

    Common

K8s in various clouds/on-prem

AWS

- When EKS (Hosted Kubernetes) is out, I think people would prefer to use that. I hope it is integrated
  with Amazon secret store and Amazon Parameter store.
- If EKS isn't an option, we should look into kops for kubernetes deployment and look at solving secrets and config ourselves
- ECS is not an option (Since colocation would be a problem)
- Read https://news.ycombinator.com/item?id=15808065 (Especially comments)

GCP

- GCE (Managed Kunernets, most attractive, some advanced customization not possible)
- kops works with GCP

Azure

- ACS
- No kops https://github.com/kubernetes/kops/issues/3957
- Should we even do this?

On-prem

- Kubespray (uses ansible under the hood - https://kubernetes.io/docs/getting-started-guides/kubespray/)
- We should definitely leave this to community and see if there are any takers for kubespray

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions