Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Implemented ginkgo test against local kubeconfig #2015

Merged
merged 3 commits into from
Jul 1, 2022

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Jun 28, 2022

Fixes #

Description
Enables e2e tests against a local cluster.

  • test resources are all tagged with a special label for cleanup
  • Made progress on CI infra setup (critical addons only, role name, etc), but gaps remain
 ~/w/g/s/g/aws/karpenter │ on testinfra !1  make e2etests                                2 ✘ │ at 18:00:08
ginkgo -r test -- --environment-name=karpenter-test-infrastructure
Running Suite: Integration Suite
================================
Random Seed: 1656550823
Will run 1 of 1 specs

• [SLOW TEST:85.879 seconds]
Sanity Checks
/Users/etarn/workspaces/go/src/github.com/aws/karpenter/test/suites/integration/suite_test.go:30
  should provision nodes
  /Users/etarn/workspaces/go/src/github.com/aws/karpenter/test/suites/integration/suite_test.go:31
------------------------------

Ran 1 of 1 Specs in 87.084 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS

Ginkgo ran 1 suite in 1m33.141041379s
Test Suite Passed

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ellistarn ellistarn requested a review from a team as a code owner June 28, 2022 23:46
@ellistarn ellistarn requested a review from bwagner5 June 28, 2022 23:46
@netlify
Copy link

netlify bot commented Jun 28, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 53d0a77
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/62bf328911719e0008e158d9

@ellistarn ellistarn marked this pull request as draft June 28, 2022 23:48
@ellistarn ellistarn force-pushed the testinfra branch 8 times, most recently from 4fa3c65 to 19db495 Compare June 30, 2022 00:48
@ellistarn ellistarn marked this pull request as ready for review June 30, 2022 00:56
@ellistarn ellistarn force-pushed the testinfra branch 2 times, most recently from 0210f59 to cbe1b62 Compare June 30, 2022 01:01
for _, namespace := range namespaces.Items {
wg.Add(1)
go func(object client.Object, namespace string) {
env.Expect(env.Client.DeleteAllOf(env, object,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add &client.DeleteAllOfOptions{DeleteOptions: client.DeleteOptions{GracePeriodSeconds: ptr.Int64(0)}})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this to be closer to natural usage of clusters and include normal deletion flows rather than force deletion.

@ellistarn ellistarn force-pushed the testinfra branch 2 times, most recently from b468169 to 5a6c55a Compare June 30, 2022 04:41
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s discuss how we want to separate tasks, clusterTasks, and pipelines in the filesystem. If someone’s importing all the files from GitHub this is quite easy and nice to find the file definitions in the repo as you've done it.

As the amount of configs increase: if someone’s doing this manually, they have to figure out how to find all related tasks then apply them manually. I think we should keep ClusterTasks in their own ClusterTasks folder and then any tasks in their respective suites. WDYT?

helm upgrade --install --namespace karpenter --create-namespace \
karpenter karpenter/karpenter \
--version ${KARPENTER_VERSION} \
helm upgrade --install --namespace karpenter --create-namespace karpenter --version v0.13.1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to v0.9.0 here. That's the highest version of Karpenter tested in KIT right now while they figure out this issue. Vending the Karpenter installation for KIT can be possible in the future and where they would establish the Karpenter version we could use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really bad precedent to set. We can't be behind on karpenter in karpenter testing. Can't we set a higher ttl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vending the Karpenter installation for KIT can be possible in the future and where they would establish the Karpenter version we could use.

Is this something like kitctl?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. If this is an issue with pre-binding then I want to be able to figure this out. This was a fairly large change in assumptions for Karpenter so I don't think we'll be sticking here for long. We use a TTL of 180 right now.

- name: karpenter-version
default: latest
- name: account-id
default: "767520670908"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this

kind: ClusterConfig
metadata:
name: $(params.cluster-name)
version: 1.22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eksctl v0.86.0 doesn't work with 1.22

withOIDC: true
EOF
- name: kubeconfig
image: docker.io/njtran/scratch:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use

Suggested change
image: docker.io/njtran/scratch:latest
image: public.ecr.aws/d2q6r4b4/minimal:latest

TEMPOUT=$(mktemp)
curl -fsSL https://karpenter.sh/"$(params.karpenter-version)"/getting-started/getting-started-with-eksctl/cloudformation.yaml > $TEMPOUT \
&& aws cloudformation deploy \
--stack-name "Karpenter-$(params.cluster-name)" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to collide with our karpenter installation if using the same cluster name as the management cluster. Can we make it the following? It may be better to change up the host-cluster setup too so that we don't run into the chance of IAM Role name conflicts.

Suggested change
--stack-name "Karpenter-$(params.cluster-name)" \
--stack-name "KarpenterTesting-$(params.cluster-name)" \

Comment on lines 10 to 11
- name: repository
- name: path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put descriptions for these

@ellistarn ellistarn force-pushed the testinfra branch 2 times, most recently from a86af3d to 3118682 Compare June 30, 2022 18:11
tags:
kit.aws/substrate: ${CLUSTER_NAME}
subnetSelector:
karpenter.sh/discovery: ${CLUSTER_NAME}
securityGroupSelector:
kubernetes.io/cluster/${CLUSTER_NAME}: owned
karpenter.sh/discovery: ${CLUSTER_NAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the Churn. AWS Load Balancer discovers security groups like this. Without this tag discovery, KIT Guest clusters will not connect.

Suggested change
karpenter.sh/discovery: ${CLUSTER_NAME}
kubernetes.io/cluster/${CLUSTER_NAME}: owned

name: karpenter-integration-suite
spec:
params:
- name: commit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add description for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's self documenting as is.

tasks:
- name: github-action
taskRef:
name: github-action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we self-reference the type of object it is to make it clear?

Suggested change
name: github-action
name: github-action-cluster-task

Copy link
Contributor Author

@ellistarn ellistarn Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of adding types to object names. deployments/foo-deployment stutters

pkg/test/pods.go Outdated
@@ -93,7 +93,7 @@ func Pod(overrides ...PodOptions) *v1.Pod {
Tolerations: options.Tolerations,
InitContainers: []v1.Container{{
Name: strings.ToLower(sequentialRandomName()),
Image: options.Image,
Image: "alpine",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hardcoding the image here if we're allowing pod images to be defined in the first place? Looks like we also have a default above on L78?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause doesn't work as an init container. Needed something that could exit.

ProviderRef: options.ProviderRef,
Taints: options.Taints,
StartupTaints: options.StartupTaints,
Labels: options.Labels,
Limits: &v1alpha5.Limits{Resources: options.Limits},
TTLSecondsAfterEmpty: ptr.Int64(10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we're picking 10 here? Can we parameterize empty and expired? I can imagine these are functions we'll need to test too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually sure -- I wanted to default for 10 until we need to override.

provisioner.SetDefaults(context.Background())
_ = provisioner.Validate(context.Background())
if err := provisioner.Validate(context.Background()); err != nil {
logging.FromContext(context.TODO()).Info("TODO: Fix the tests that cause this")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we report the error here too? I'm unsure if this is a logged TODO that should be used when running tests or a TODO for yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ton of existing tests break if I fail here, since the error was originally squashed incorrectly. I wanted to log and clean it up later, since I'm OOTO for a few days and wanted to unblock folks.

@@ -1,6 +1,5 @@
export CLUSTER_NAME="${CLUSTER_NAME:-karpenter-test-cluster}"
export CLUSTER_NAME="${CLUSTER_NAME:-karpenter-test-infrastructure}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would make the stack eksctl-karpenter-test-management-cluster

Suggested change
export CLUSTER_NAME="${CLUSTER_NAME:-karpenter-test-infrastructure}"
export CLUSTER_NAME="${CLUSTER_NAME:-karpenter-test-management}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

karpenter-test-management is less idiomatic than karpenter-test-infrastructure to me.

Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have some disagreements on what the CI Infra should be right now. I think the ginkgo work looks great and am happy to approve that if we scope this one down.

Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a nit or two


var (
EnvironmentLabelName = "testing.karpenter.sh/environment-name"
CleanableObjects = []client.Object{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure that Nodes are cleaned up? I think the TTL on the Provisioners is meant to do that, but if a test fails to delete nodes (which could happen), then we want our Cleanup to also include those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I forgot that Nodes are owned by provisioners now, so this should be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to test scale down, too. I think this library will need to be greatly expanded upon (e.g. wait for things to cleanup). This is just to get us started.


var _ = Describe("Sanity Checks", func() {
It("should provision nodes", func() {
provider := test.AWSNodeTemplate(test.AWSNodeTemplateOptions{AWS: v1alpha1.AWS{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Karpenter being tested here doesn't have a defaultInstanceProfile, this should fail. I think it's fine to leave as is, but I think we should make sure we can test both modes eventually.

@njtran njtran merged commit 4c35c0f into aws:main Jul 1, 2022
@ellistarn ellistarn deleted the testinfra branch July 7, 2022 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants