Skip to content

Latest commit

 

History

History
255 lines (162 loc) · 8.35 KB

swarm.md

File metadata and controls

255 lines (162 loc) · 8.35 KB

[toc]

Running API on docker swarm (Linux)

If a swarm is already running, jump to deploying the api, otherwise use the instructions below to get the swarm running.

Installing Docker

  1. Install docker on target ec2 instances (assuming running Ubuntu 16.04):

Setup the Docker apt repository:

> sudo apt-get update
> sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Now verify that the key fingerprint for the Docker apt repository is correct:

> sudo apt-key fingerprint 0EBFCD88

Now setup the stable repository:

> sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

And install docker:

> sudo apt-get update
> sudo apt-get install -y docker-ce

Installing Docker-compose

The docker-compose file is, for some horrible reason, installed separately from stand alone Docker. Find the latest release on the docker-compose release page, and curl the latest release:

> sudo curl -L https://github.com/docker/compose/releases/download/1.19.0/docker-compose-Linux-x86_64 -o /usr/local/bin/docker-compose
> sudo chmod +x /usr/local/bin/docker-compose

Replace the 1.19.0 with the latest stable release number.

Setting Up the Swarm

When setting up a swarm environment on AWS, the following ports will need to be open between all machines in the cluster: 2181, 2377, 4789, 5000, 7946, 9455, and 9555.

The ports used for communications with the API will need to be open as well, so remotes can connect to the API. This includes ports 17141 and 17504.

Full port list: 2181, 2377, 4789, 5000, 7946, 9455, 9555, 17141, 17504.

The utility of each of these ports:

  • 2181: Used by Zookeeper and Kafka for service management (TCP/UDP)
  • 2377: Used by Docker Swarm for cluster management (TCP/UDP)
  • 4789: Used by Docker Swarm overlay network (TCP/UDP)
  • 5000: Temporary Docker Registry port for container distribution to Docker Swarm nodes (HTTP)
  • 7946: Docker Swarm control traffic
  • 9455: Swarm internal port for Kafka communication (HTTP/TLS)
  • 9555: Swarm external port for Kafka communication (HTTP/TLS)
  • 17141: Sensing API Insecure port (HTTP)
  • 17504: Sensing API Secure port (HTTP/TLS)

As well, internal DNS (as handled by Route 53) must include the following routes, for now all pointing at the Swarm Master node:

  • sensing-api.savior.internal
  • sensing-ca.savior.internal
  • sensing-kafka.savior.internal

Manager Node

On the machine that will be the swarm manager node, run:

sudo docker swarm init --advertise-addr $(ifconfig | awk '/inet addr/{print substr($2,6)}' | grep 10.)

and record the output of the swarm init, which will look something like:

To add a worker to this swarm, run the following command:

	docker swarm join --token SWMTKN-1-0ktxq9kcuz9t6kvw9559wq5r3i2qsqk5lx0f55y0rilnw719p1-02y6jh213sezys4cwn4uvv2dg 10.0.4.61:2377

Worker Nodes

On all of the worker nodes, use the following command to join the swarm (replacing the --token value with the actual value generated by the manager at startup):

sudo docker swarm join --token SWMTKN-1-0ktxq9kcuz9t6kvw9559wq5r3i2qsqk5lx0f55y0rilnw719p1-02y6jh213sezys4cwn4uvv2dg 10.0.4.61:2377

If you didn't record the join token from the manager when you started the swarm, you can retrieve it on the manager with the command:

sudo docker swarm join-token worker

Removing worker nodes from the swarm is straight-forward, and must be run from the node to be removed from the swarm:

sudo docker swarm leave

Start the APINET network

Start the external docker overlay network

> sudo docker network create --driver overlay --attachable --subnet 192.168.1.0/24 apinet

Notice that we're directly setting a subnet for use in the Swarm network - if we don't do this, the default network used in swarm has conflicts with the default subnet in the AWS VPC, that is overlapping 10.0.1.0/24 segments, which wreaks havoc with DNS and container routing. The name of this network, apinet, must match the defined external network name in the docker-compose-swarm.yml compose file.

Pull the API code

Our Virtue/SAVIOR repository is a private github repo, so you'll need either a checkout/clone link with an embedded access token, or you can export your token to the Bash environment with:

export GITHUB_TOKEN=<your token here>

Checkout a copy of the Savior repository:

git clone "https://$GITHUB_TOKEN@github.com/twosixlabs/savior.git"

Make sure you're on the branch you intend to run from.

Setup a Docker Registry

Moving containers built with docker-compose between the different nodes of a docker swarm requires a registry. Rather than using the global Docker Hub registry, we spin up our own registry as part of our deploy step. Start the registry with:

> sudo docker service create --name registry --publish 5000:5000 registry:2

You can confirm that the registry is running with:

> curl http://localhost:5000/v2/
{}

The empty JSON dictionary return is the expected result.

Build and Push Containers

We need to build our containers and push the results to our local registry:

sudo /usr/local/bin/docker-compose -f docker-compose-swarm.yml build
sudo /usr/local/bin/docker-compose -f docker-compose-swarm.yml push

Deploy the API Stack

Before deploying to the swarm or building the swarm network, source the swarm_setup.sh script to prep the host environment:

. ./bin/swarm_setup.sh

Instead of directly invoking the docker-compose command, we'll deploy the API as described by the docker-compose-swarm.yml compose file using the docker stack interface to the Swarm.

Deploy everything with:

> sudo docker stack deploy --compose-file docker-compose-swarm.yml savior-api

You can check what's running in the service with:

sudo docker stack services savior-api

You can generally check that things are running smoothly by looking for errors in the API logs:

> sudo docker service logs -f savior-api_api

For debugging the current state of services, you can get a non-truncated PS result from the stack with:

> sudo docker stack ps savior-api --no-trunc

Tear down the stack with:

> sudo docker stack rm savior-api

Tear down the network

> sudo docker network rm apinet

Load Configurations

If this is the first run for the API on this swarm, the database may need to be seeded with sensor configurations:

> ./bin/load_sensor_configurations.py

OPS

Restarting Services

Individual services can be restarted/updated with:

> sudo docker service update --force savior-api_api

Where the savior-api_ prefix is determined by the name we gave the deployed stack, and the suffix is the name of the service in the docker compose file.

Logs

Get logs for individual services:

> sudo docker service logs -f savior-api_api

Where the savior-api_ prefix is determined by the name we gave the deployed stack, and the suffix is the name of the service in the docker compose file.

Execute on Service

Enter into an interactive bash session on any of the services:

> sudo docker exec -ti savior-api_api.1.$(sudo docker service ps -f 'name=savior-api_api.1' savior-api_api -q) /bin/bash

This is more complicated than standard docker exec commands because of the naming format for service deployments.

Solving no such image localhost:5000/... issues

The Docker registry:2 service we deploy on the swarm should be reachable from any node in the swarm at localhost:5000, due to the load-balancing done by the Swarm for exposed service ports. Sometimes, though, things go wrong.

If you consistently see no such image localhost:5000/... type errors from docker service ps savior_api- calls for any image, chances are the Swarm routing overlay network isn't communicating, or is otherwise unable to load balance. The easiest way to restore service (after checking that network ACLs haven't changed), is to remove the Swarm node workers with docker swarm leave and then have each node re-join the swarm. This will reset the overlay routing.