Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reviving service deployment docs #631

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions ocfweb/docs/docs/staff/procedures/deploying-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
[[!meta title="Deploying Services"]]

## Overview
Are you looking to deploy a new service to the OCF Kubernetes cluster? This
document will cover all of the steps required to do so. Do note that this is **not** a
substitute for a Kubernetes tutorial or a Docker tutorial _(there are many
resources online for that)_ but a guide for getting your service running on the
OCF. There are a lot of pieces of OCF infrastructure at play here, but this
should serve as a gentle introduction into how things are done.

This `HOWTO` uses one of the OCF's simplest services as an example:
[Templates][templates]. Templates is service used internally by OCF staff
serving 'copy-pasteable' email templates. You can use [[git|doc
staff/backend/git]] to `clone` this [repo](https://github.com/ocf/templates/blob/master/kubernetes/templates.yml.erb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd link to the root of the repo instead of just the kubernetes config

, or just follow along looking at the files. If you want to blaze ahead,
this repo should serve as a good example of the required components in the
repo for deployment.

## Prerequisites
If you're looking to deploy your service, you should first have a service to
deploy. This is normally done with a repository underneath the
[OCF's Github organization](https://github.com/ocf). If you have the correct
permissions, you should be able to transfer, fork, or just create a
repository in the org.
Comment on lines +22 to +24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume that people reading this guide are less experienced at creating services and likely don't have these permissions. Instead, I'd probably suggest to ask in #rebuild or somewhere similar for creating a repo (or do so if you have the permissions already).

As a tangent to this, I am aiming to have this be done through https://github.com/ocf/terraform at some point in the future, as it has a GitHub provider and that would be a nice way to not rely on individual permissions but instead define the state of the ocf org in version-controlled code. I think that would make this part nicer, as creating a new repo would then involve creating and merging a PR and essentially nothing else.


### Docker & Jenkins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better for people who want to deploy services to just clone or copy these files somehow. They're just going to copy paste from this doc anyway. I would wager that the vast majority of people who want to deploy services on kubernetes don't need to know how this works, it just has to "work." (of course, we should provide docs somewhere that explain how it works anyway)

You could have them run some script that takes in their proposed service name and it could even generate a basic template for those kubernetes yaml files too.

Probably out of scope for this PR, but something to consider for the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github has template repositories that can help solve the issue of copying files, so we can create one for Kubernetes deployments.


For your service to be deployed onto the Kubernetes cluster, it must have
a `Dockerfile` so that we can containerize the service. Nothing OCF specific
here, just include whatever your service needs.

[[Jenkins|doc staff/backend/jenkins]] will try to build that container,
and then send the image to the OCF Docker server. This has to be specified in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Docker repository" is the word you're looking for

the `Makefile` at the root of the repository. Here's part of what templates has:

```
BIN := venv/bin

DOCKER_REVISION ?= testing-$(USER)
DOCKER_TAG = docker-push.ocf.berkeley.edu/templates:$(DOCKER_REVISION)
RANDOM_PORT := $(shell expr $$(( 8000 + (`id -u` % 1000) + 1 )))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, this + 1 is to space out ports so you can run multiple services under one user without them colliding. I think we should probably come up with a better port allocation strategy, or just use the same port for the same user each time and forego this method as it's pretty confusing if you don't understand it.



.PHONY: dev
dev: cook-image
@echo "Will be accessible at http://$(shell hostname -f ):$(RANDOM_PORT)/"
docker run --rm -p "$(RANDOM_PORT):8000" "$(DOCKER_TAG)"

.PHONY: cook-image
cook-image:
docker build --pull -t $(DOCKER_TAG) .

.PHONY: push-image
push-image:
docker push $(DOCKER_TAG)
```

The important targets to pay attention here are `cook-image` and `push-image`,
Jenkins will try to run these as part of the deployment pipeline that continuously
deploys to Kubernetes.
Comment on lines +58 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make test is also a good one to mention, it's automatically run if it exists as part of the service pipeline so that's a great place to put any tests that a service may have.


Finally, we have to include a `Jenkinsfile` to the repository, so that Jenkins
knows how we want it to go about deploying the service. In this case, it's just
specifying that we want it to go through the pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, you might want to link to https://github.com/ocf/shared-pipeline/blob/4537c1ce537a6beffa1075c719e015cd60cc54eb/vars/servicePipeline.groovy if anyone is curious what the service pipeline actually does. It's not incredibly complicated, but I could also see not wanting to bog this down too much with implementation details like this. I think it could be useful for someone wanting to know what commands are run as part of the build or what runs in parallel though (tests and cooking the docker image for instance).


```
servicePipeline(
upstreamProjects: ['ocf/dockers/master'],
)
```
Comment on lines +62 to +70
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a little confusing to me, as in I described my understanding of it, but in other projects an empty upstreamProjects array also seemed to work fine. Perhaps @jvperrin could chime in on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a great question! The only thing this does is mean that a service build pipeline gets kicked off if the upstream one listed here succeeds. The reason why you see ocf/dockers/master in a number of services is when they are both (1) built off the OCF base docker images and (2) redeploying them can be done smoothly. These redeploys are mainly done so that security updates are pulled into service containers in a timely manner, so it doesn't have to be a daily thing but could probably be weekly or something instead.

For instance, ocfweb has both ocflib and dockers as upstreams because it depends on the latest versions of both of those repos, and because redeploying after either one finishes building successfully can be done smoothly. slackbridge on the other hand doesn't have any upstream projects listed despite being built off the OCF base image because every time it rebuilt it causes large join/quit spam on IRC that wasn't a pleasant experience.

Essentially, to figure out what's listed here for a new service, you only need to know what base image the service is using, and how disruptive a deploy would be.


## Kubernetes

In the root of your project create a `kubernetes` folder. This is where all
your Kubernetes configuration files will live. Templates, a relatively simple
service, is a single `nginx` server serving static content. Because this
application is self-contained we need to create one file,
`kubernetes/templates.yaml`.

## Service
Since templates is a web service we will first create a `Service` object. The
first step to make your Kubernetes service internet-facing is to make your
application accessible within the Kubernetes cluster. In most cases you can
simply fill in this template.

```
apiVersion: v1
kind: Service
metadata:
name: <myapp>-service
spec:
selector:
app: <myapp>
ports:
- port: 80
targetPort: <docker-port>
```

The `name` field under `metadata` resource is the name Kubernetes uses to
identify your `Service` object when, for example, you run `kubectl get
services`. The `selector` resource is the name you will use to bind `Pods` to
this `Service` object. Fill in the `targetPort` with the port that your
application uses _inside_ of the Docker container. In the case of templates we
bind to port `8000`. Here is the `Service` configuration for templates with all
Comment on lines +107 to +108
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to briefly mention here that the ports are not the same between the two (80 on both for instance) because stuff running within the container typically can't bind on anything 1024 or below. It's usually not running as root or a privileged user, as per security best practices.

Not sure that'll be clear to someone setting up a service for the first time otherwise.

the fields filled in.

```
apiVersion: v1
kind: Service
metadata:
name: templates-service
spec:
selector:
app: templates
ports:
- port: 80
targetPort: 8000
```

## Creating the deployment

Great! Now let's move onto creating our pods! To do this we'll create a
`Deployment` object. Deployments can get become complicated with application
specific configuration, but the simplicity of Templates elucidates the
bare-bones requirements for any `Deployment`.

```
apiVersion: apps/v1
kind: Deployment
metadata:
name: <myapp>-deployment
labels:
app: <myapp>
spec:
replicas: <#pods>
selector:
matchLabels:
app: <myapp>
template:
metadata:
labels:
app: <myapp>
spec:
containers:
- name: <container-name>
image: "docker.ocf.berkeley.edu/<your-repo-name>:<%= version %>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 82 mentions templates.yaml, but to include this dynamic version, it'll actually need to be a templates.yaml.erb instead, to be an actual template of its own :P

resources:
limits:
memory: <#Mi>
cpu: <cpus-in-millicores>m
ports:
- containerPort: <docker-port>
```

This section can be a bit daunting, but we'll go through it step-by-step. Fill
in `<app-name>` and `<docker-port>` with the same name you used in your
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this <myapp> instead of <app-name>?

`Service`. This will ensure your Pods are bound to the `Service` we previously
created. `replicas` is the number of instances we want. Because Templates is
used internally by OCF staff, we aren't super concerned with uptime and create
only 1 instance. For a service like `ocfweb`, where uptime is crucial, we would
opt for 3 instances to handle failover.

The `containers` resource is where Kubernetes looks to obtain `docker` images
to deploy. For production services this will _always_ be the OCF docker server:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker repository

`docker.ocf.berkeley.edu`. The field `<your-repo-name>` is the name of the
repository on the OCF GitHub, and version will be filled in automatically by
[[Jenkins|doc staff/backend/jenkins]]. For testing, it is recommended you push
your image to [DockerHub][dockerhub] or to `docker.ocf.berkeley.edu` (talk to
a root staffer in the latter case) and use a hardcoded image name.

Lastly, we set our resource limits. Templates is a low-resource service so
we'll give it 1 megabyte of memory and `50/1000` of a CPU core (Kubernetes uses
millicores for CPU units, so 1 core = 1000m). Do note that every instance of
the application gets these resources, so with _N_ instances you are using _N *
limits_.
Comment on lines +177 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ this is quite good to specify, as is the warning below about being stuck in a pending state


**WARNING**: On low-resource development cluster, asking for too much CPU or RAM
can put your application in an infinite `Pending` loop since the cluster will
never have enough resources to schedule your service (yes, this has happened to
us).

With all the fields filled in we have this Deployment object for Templates.

```
apiVersion: apps/v1
kind: Deployment
metadata:
name: templates-deployment
labels:
app: templates
spec:
replicas: 1
selector:
matchLabels:
app: templates
template:
metadata:
labels:
app: templates
spec:
containers:
- name: templates-static-content
image: "docker.ocf.berkeley.edu/templates:<%= version %>"
resources:
limits:
memory: 128Mi
cpu: 50m
ports:
- containerPort: 8000
```

The last object we need to create for the Templates service is `Ingress`. We
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does adding the name to https://github.com/ocf/puppet/blob/master/modules/ocf_kubernetes/manifests/master/loadbalancer.pp still apply? If so we need to include that step.

want to expose our service to the world with the fully-qualified domain name
`templates.ocf.berkeley.edu`. Ingress, like Service objects, are similar for
most services.

```
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: virtual-host-ingress
spec:
rules:
- host: <myapp>.ocf.berkeley.edu
http:
paths:
- backend:
serviceName: <myapp>-service
servicePort: 80
```

Note that `serviceName` _must_ be the same as that used in the `Service`
object. Now that we have ingress, all requests with the `Host` header
`templates.ocf.berkeley.edu` will be directed to a Templates Pod!


## Deployment extras

### OCF DNS

If your application at any point uses OCF-specific DNS, like using the hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably mention the term "search domain" somewhere in here as that's the keyword that can help to find more information on the internet about this behavior.

`mysql` as opposed to `mysql.ocf.berkeley.edu` to access `MariaDB`, then you
need to add this under your deployment `spec`.

```
dnsPolicy: ClusterFirst
dnsConfig:
searches:
- "ocf.berkeley.edu"
```

### NFS

If your application does not need access to the filesystem then you can skip
this section. If your application needs to keep state, try to explore `MariaDB`
as a much simpler option before making use of `NFS`.

For Kubernetes to access the file system we need two objects: a
`PersistentVolume` and a `PersistentVolumeClaim`. The former maps a filesystem
to the cluster, and the latter is how a service asks to access that filesystem.
You will need to create the `PersistentVolume` in [Puppet][puppet] as
<app-nfs-pv.yaml>. In this example we'll create 30 gigabytes of readable and
writeable storage.

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have people try to use the https://github.com/ocf/nfs-provisioner instead. We set it up so you no longer need to make puppet changes to create PVs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot easier. All you need to do is make a PVC like in the README with the correct storageClassName.

apiVersion: v1
kind: PersistentVolume
metadata:
name: <myapp>-nfs-pv
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
nfs:
path: /opt/homes/services/<myapp>
server: filehost.ocf.berkeley.edu
readOnly: false
```

That's all you need to add to Puppet. Now you need to add the
`PersistentVolumeClaim` object to your service. Here we will claim all 30
gigabytes of the volume we added in Puppet.

```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: <myapp>-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
volumeName: "<myapp>-pv"
```

Under our `deployment` we add a `volumes` sequence under `spec`. Use the
`volumeName` you chose in the `PVC`.

```
volumes:
- name: <myapp-data>
persistentVolumeClaim:
claimName: <myapp>-pvc
```

Now we've set up the volume claim. Finally, we need to tell Kubernetes to mount
this `PVC` into our docker container. Under the `container` resource add:

```
volumeMounts:
- mountPath: /target/path/in/my/container
name: <myapp-data>
```


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually you'll want to add a section for adding keycloak auth and handling secrets, but that's out of scope.

## Wrapping up

Now we have all the necessary configuration to deploy our service. To see if
everything works, we will deploy the service manually. On `supernova`, first
run `kinit`. This will obtain a [[kerberos|doc staff/backend/kerberos]] ticket
giving us access to the Kubernetes cluster. Now run

```
kubectl create namespace <myapp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you can include the yaml to create this within the .yaml file

kubectl apply -n <myapp> -f <myapp>.yaml
```

You can run `kubectl -n <myapp> get all` to Kubernetes create your `Service`
and `Deployment` objects.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't relevant with Jenkins right? I thought this might be nice to include as example commands if someone was planning to deploy to dev-kubernetes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's necessary because Jenkins will create the app namespace and all that when deploying, and I think it also does the service/deployment creation parts too but I'm not too familiar with that side of things.

Maybe just mention before this that it's something you can do with your dev namespace to test out running the service somewhere?


### Production Services: Setting up DNS

If you are testing your deployment, use
`<myapp>.dev-kubernetes.ocf.berkeley.edu` as your Ingress host and that will
work immediately. When you deploy your service to production, make sure to
follow the instructions below.

The final step to make your service live is to create a DNS entry for your
Kubernetes service. You will need to clone the OCF dns repo.

```
git clone [email protected]:ocf/dns.git
```

Since we are adding DNS for a Kubernetes service, we run `ldapvi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll also need a kinit $USER/admin before the ldapvi (either as a separate command or just put as a prefix to the command) to actually be able to edit and submit changes to ldap.

I think these editing DNS instructions should actually almost be their own docs. I looked and there's https://www.ocf.berkeley.edu/docs/staff/procedures/new-host/#h3_step-11-add-the-ldap-entry in the docs for adding a new host and DNS, but this is a bit different as it's just editing the existing lb-kubernetes pseudo-host instead. Anyway, something to think about, but this might be useful as a separate docs page that multiple places can link to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, lb-kubernetes no longer exists since there's no non-kubernetes version, it's just a cn=lb now (this is something that changed since the previous pull request).

cn=lb-kubernetes`. Add a `dnsCname` entry for your application. Run `make` and
commit your changes to GitHub. Once the DNS propagates and Puppet runs on all
the Kubernetes masters (wait about 30 minutes) your service will be accessible,
with TLS, at `<myapp>.ocf.berkeley.edu`. Congratulations!


## Reference Material

The best way to get services up-and-running is to read code for existing services.

[templates][templates-deploy]: The simplest possible deployment. `nginx` server with static content.

[kanboard][kanboard-deploy]: Project management software that makes use of `ldap` and mounts `nfs`.

[mastodon][mastodon-deploy] (Advanced): Applies custom patches, uses `ldap`, mounts `nfs`, has pods for `redis`, `sidekiq`, and `http-streaming`.

[kafka][kafka-deploy] (Advanced): Runs a `kafka` cluster inside of Kubernetes.

[templates]: https://templates.ocf.berkeley.edu
[dockerhub]: https://hub.docker.com
[puppet]: https://github.com/ocf/puppet/tree/master/modules/ocf_kubernetes/files/persistent-volume-nfs
Comment on lines +373 to +375
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For these 3 links that are referred to earlier on in the doc, I'd put them closer to their respective paragraphs just so that it's easier to see what is actually being linked to closer to the text that's doing the linking. (That or using the []() link syntax instead)

[templates-deploy]: https://github.com/ocf/templates/tree/master/kubernetes
[kanboard-deploy]: https://github.com/ocf/kanboard/tree/master/kubernetes
[mastodon-deploy]: https://github.com/ocf/mastodon/tree/master/kubernetes
[kafka-deploy]: https://github.com/ocf/kafka/tree/master/kubernetes
Comment on lines +361 to +379
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot more services now - this many not be that accurate. What are the best examples at the OCF today?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

templates is still a really good simple example, and I think mastodon is a good complex example, but I'm not sure about the others. I do know kanboard isn't running any more (gives a 503), and I don't really know the state of kafka.