Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple environments + droneci #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

multiple environments + droneci #63

wants to merge 1 commit into from

Conversation

lrvick
Copy link
Member

@lrvick lrvick commented May 20, 2017

Initial stab at droneci deployment for #! in a dedicated VPC.

Most apps should probably end up in a production k8s cluster or similar, but CI is a bit special in that it it will have god rights to manage other environments and peer to all environments/regions.

One of the features of DroneCI is the ability to manage terraform changes so long as droneCI itself is not managed by terraform because... yeah.

Ideally in the end the CI environment should end up the only one we have to maintain by hand, and very seldom so.

In the mean time however I am not really clear how to make the Makefile make sense with multiple environments.

None of this is deployed yet, and we also will need to rename terraform buckets/topics by hand to make this work.

Mostly looking for feedback atm. I did deploy a very similar setup on my personal AWS account and all seems to work as expected.

Copy link
Member

@KellerFuchs KellerFuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, and it goes way beyond what the title indicates.

Comments inline, but for me the main issues are:

  • Deploying again CoreOS, when we all agreed multiple time not to use this for the k8s cluster.
  • Baking KMS in several places (Terraform, Drone's config, ...); I would be happier with something vendor-neutral, rather than going with vendor-lockin in the initial setup.

@@ -0,0 +1,19 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#!/bin/sh
set -e

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not list potentially logging plaintext secrets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh -e != -x

#!/bin/bash
unset IFS
out_path=${1:-/out/secrets.env}
rm "$out_path" 2> /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm -f (so that set -e doesn't make it explode when $out_path doesn't exist yet).

for line in $(env | egrep '^KMS_'); do
kms_key="${line%%=*}"
key=${kms_key/KMS_/}
encrypted_value=${line#*=}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does ${line%%=*} and ${line#*=} do? Is it POSIX?

Copy link
Member Author

@lrvick lrvick May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bash variable substitution %%=* says "strip the equals sign and anything after it" and #*= says "remove the equals sign and anything before it". When using bash there is little need for things like sed/awk for truncation and replacement.

I am only injecting the bash binary into the container so KISS.

}

module "vpc" {
source = "github.com/terraform-community-modules/tf_aws_vpc?ref=v1.0.6"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that anyone with push access to github.com/terraform-community-modules/tf_aws_vpc gets code execution as the admin running terraform (or as our CD infra, eventually)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could easily make this a submodule.

module "vpc" {
source = "github.com/terraform-community-modules/tf_aws_vpc?ref=v1.0.6"
name = "ci"
cidr = "10.0.0.0/16"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we allocating the whole of 10/16 to the CI?
What's actually in use there? Please document the network segmentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just giving a 10/16 to each enviornment of which will initally only be Production and CI. Keeping them far apart to have plenty of addresses for throw-away workers etc in the future.

Copy link
Member Author

@lrvick lrvick May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further clarify, we can later (with a bit more $) upgrade this with the drone agent containers in a dedicated autoscale group as dumb on-demand workers of which they will churn IP addresses pretty quick when doing a bunch of builds/jobs at once.

name = "${var.name}"
}

data "aws_ami" "coreos_stable" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not doubling down on the CoreOS mistake.
We had this discussion already...

Copy link
Member Author

@lrvick lrvick May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a k8s cluster. This is a standalone single-server solution that will manage the k8s cluster, terraform, automated testing etc.

If you want to write and maintian your own standalone baseimage that provides the ability to fully bootstrap everything I need via user-data I am happy to use that. We could also just switch to your drop-in replacement later once it exists with a 5 line PR here.

In this case being super minimal is pretty ideal imo. All of this is is totally fine to nuke&pave which lends well to the benefits of ContainerLinux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ContainerLinux is the officially supported base OS for k8s so I will probably opt to use that there too until your replacement exists.

Copy link
Member

@daurnimator daurnimator May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daurnimator We would just need to make a base image that is all set up for that use case, and maintain that image, and the pipeline for that image.

That image would also probably need to be maintained by CI... and now we ended up with a strange recursion situation.

IMO having CI itself use an upstream maintained distro that has everything we need to be hands-off is not a bad thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a standalone single-server solution that will manage the k8s cluster, terraform, automated testing etc.

Can you explain that part?

Copy link
Member Author

@lrvick lrvick May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KellerFuchs droneci has the ability to deploy k8s jobs, run ansible playbooks, apply terraform configuration on merge to master, check signatures, build containers, and do pretty much anything we want to automate via the many available modules. The drone.yml files included in projects are ~ a superset of travis.

That is why I opted to put this in a dedicated VPC/Enviornment with full machine/network isolation away from the production environments.

If we were to later have a develop environment or a stanging enviornment etc we would allow this VPC to have "VPC Peering" to any other environment, without giving them access to talk to each other.

Also since this module will be managing terraform templates, it can't manage itself via terraform. Given that this enviornment has a dedicated terraform root-config to manage drone and all the AWS resources it needs to function and generally maintain itself.

From there, it could then manage any other VPCs/environments via other terraform configs or ansible playbooks.

Really shooting for this to be very hands-off thus the auto-dns updating etc so it can just terminate and rebuild itself on failure or for updates.

def handler(event, context):
ec2 = boto3.resource('ec2')
route53 = boto3.client('route53')
message = json.loads(event['Records'][0]['Sns']['Message'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we ensuring that only legit messages land in the SNS queue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via the IAM rules that are also automatically deployed with this module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/hashbang/admin-tools/pull/63/files/0b2fefab2d1f7f15d3cb0c19e742495cd4630186#diff-39ceb2360f3ad2c62e6b1444254fe8d6R114

That is the IAM policy which is only made available to the AWS autoscale group API to report to it.

It is whitelist only, and nothing else is given write access to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed that part, thanks :3

}
}

resource "aws_security_group_rule" "all_internal_ingress" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can avoid allowing everything as ingress?
(Not a priority, and not required for initial dpeloyment.)

Copy link
Member Author

@lrvick lrvick May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are free to add any rules that we want. This is just one of many rules we can include for any given use case.

This is just meant as a holding module for very common rules for simpler syntax.

Custom rules can always be created in-place.

@lrvick
Copy link
Member Author

lrvick commented May 23, 2017

@KellerFuchs I think you may of misunderstood the scope of this.

This is meant to be the single server CI environment and all attempts have been made to keep it as lean as possible.

I could use Vault but then we need to maintain a vault cluster for 2 secrets.

I could use pass but terraform preserves rendered configs in plaintext. KMS is the easiest/cheapest way to bootstrap secrets out-of-band to be made available only to a given AWS instance role JIT.

Once k8s is actually deployed in the production environment/vpc the built in etcd based secret store is probably good enough for all of our services with the exception of this one.

CI is the only machine with god rights so I wanted it to be very much isolated and self-contained away from the production environment.

@lrvick
Copy link
Member Author

lrvick commented May 23, 2017

Re vendor lock-in: this is taking advantage of AWS built-ins in order to minimize cost since this is standalone. We could develop alternate more expensive secret handling such as vault and deploy this same cloud-init to most any other provider though. But then vault itself needs to be bootstrapped. The very top of the chain can't avoid having some lock-in, but we can make sure everything down-stream in this case does not know about anything but drone, which is portable.

The production cluster will be k8s and the apps in that environment won't know anything about AWS.

@KellerFuchs
Copy link
Member

I think you may of misunderstood the scope of this.

I indeed did!

@KellerFuchs KellerFuchs dismissed their stale review May 23, 2017 05:01

Misunderstood scope, should re-review.

@singlerider
Copy link
Contributor

I'm afraid it's been... nine years. Is this going anywhere?

@mayli
Copy link

mayli commented Apr 4, 2018

nope

@lrvick
Copy link
Member Author

lrvick commented Apr 4, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants