Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug why helm chart/config connector doesn't create a new service account #669

Closed
1 of 5 tasks
sgibson91 opened this issue Sep 10, 2021 · 11 comments
Closed
1 of 5 tasks
Assignees

Comments

@sgibson91
Copy link
Member

sgibson91 commented Sep 10, 2021

Description

While working on #662, it came to light that this should already be automated as part of our helm chart by including the appropriate config in the hub config file. However, when I tried to enable this for the Pangeo cluster, the service account (staging-user-sa@{{ pangeo project ID }}.iam.gserviceaccount.com) doesn't get created as expected. We need to work out why so we can automate this.

Annoyingly, there's no error message to help debug - I know it's not working because I can see from the Google Cloud IAM page that the service account is not being created.

For now, requester pays access for Pangeo is manually enabled so this isn't a blocking issue.

Value / benefit

Automating this deployment will save us a lot of manual busywork :)

Implementation details

No response

Tasks to complete

  • Try the deploying the config to a cluster in two-eye-two-see-sandbox. This will help us figure out if it's a bug due to the special-casing of Pangeo or if it's a more general broken piece of config
  • Set up this infrastructure with Terraform as a stop-gap solution (rather than using config connector)
  • IDENTIFY AND SQUASH THE BUG! Update this with more details when they're available
  • ...
  • Document how the config snippet that enables requester pays access

Updates

@sgibson91 sgibson91 changed the title Debug why helm chart/config connector doesn't create a new service account for Pangeo cluster Debug why helm chart/config connector doesn't create a new service account Sep 10, 2021
@sgibson91
Copy link
Member Author

I tried deploying this in two-eye-two-see-sandbox and the [email protected] service account is not created there either. So this is a general problem, not Pangeo-specific.

@damianavila
Copy link
Contributor

Looking quickly at the pangeo yaml files:

I do not see scratchBucket stuff

scratchBucket:
  enabled: true

on those above yaml files as I see, for instance, in the 2i2c cluster one: https://github.com/2i2c-org/pilot-hubs/blob/master/config/hubs/2i2c.cluster.yaml#L63-L70)

And that "key" is expected to be enabled to create the service account and friends: https://github.com/2i2c-org/pilot-hubs/blob/master/hub-templates/basehub/templates/cloud-resources/gcp/service-account.yaml#L1

Am I missing some other Pangeo config file where scratchBucket was enabled?

@sgibson91
Copy link
Member Author

sgibson91 commented Sep 11, 2021

Sorry, it is not yet in the config file in master. But when I DO deploy with that config present - it doesn't work.

I will share the config I deployed to set up the test cluster in two-eye-two-see-sandbox in the morning.

@sgibson91
Copy link
Member Author

sgibson91 commented Sep 11, 2021

.tfvars file I used to setup the cluster in two-eye-two-see-sandbox:

prefix     = "test-hubs"
project_id = "two-eye-two-see-sandbox"

core_node_machine_type = "n1-highmem-4"

# Multi-tenant cluster, network policy is required to enforce separation between hubs
enable_network_policy    = true

# Some hubs want a storage bucket, so we need to have config connector enabled
config_connector_enabled = true

notebook_nodes = {
  "user" : {
    min : 0,
    max : 2,
    machine_type : "n1-standard-4",
    labels: { }
  }
}

user_buckets = [
  "test",
]

Config file I used to deploy a test hub onto the cluster - you can see the cloudResources block is enabled at the bottom:

name: test-hubs
provider: gcp
gcp:
  key: secrets/test-hubs.json
  project: two-eye-two-see-sandbox
  cluster: test-hubs-cluster
  zone: us-central1-b
support:
  config:
    grafana:
      ingress:
        hosts:
          - grafana.test-hubs.2i2c.cloud
        tls:
          - secretName: grafana-tls
            hosts:
              - grafana.test-hubs.2i2c.cloud
    ingress-nginx:
      controller:
        admissionWebhooks:
          enabled: false
hubs:
  - name: test
    domain: test.test-hubs.2i2c.cloud
    template: basehub
    auth0:
      connection: github
    config:
        jupyterhub:
          hub:
            config:
              Authenticator:
                allowed_users: &test_users
                  - <staff_github_ids>
                admin_users: *test_users
          singleuser:
            image:
              name: pangeo/pangeo-notebook
              tag: master
          custom:
            cloudResources:
              provider: gcp
              gcp:
                projectId: two-eye-two-see-sandbox
              scratchBucket:
                enabled: true

I left this up over night as I saw in the config connector docs

Config Connector's controllers eventually reconcile your environment with your desired state

which lead me to wonder if I was just being impatient again 😉

However, you can see here that there is no service account with the expected name [email protected]. Though the annotation is correctly applied.

$ kubectl -n test describe serviceaccount user-sa
Name:                user-sa
Namespace:           test
Labels:              app.kubernetes.io/managed-by=Helm
Annotations:         iam.gke.io/gcp-service-account: [email protected]
                     meta.helm.sh/release-name: test
                     meta.helm.sh/release-namespace: test
Image pull secrets:  <none>
Mountable secrets:   ***
Tokens:              ***
Events:              <none>

@yuvipanda
Copy link
Member

There should be pods in the config-connector namespace that actually make this work - so look at the logs there to see how those pods are doing? The pilot-hubs cluster also has this working, so we can use that to evaluate differences.

@sgibson91
Copy link
Member Author

sgibson91 commented Oct 14, 2021

On the Pangeo hubs, I manually created the staging-user-sa and prod-user-sa that config connector is expecting since we have now enabled that config in #748 (for scratch buckets) and the original requester-pays-sa annotation will be overwritten in CI now.

Hopefully once I fought #653 and #741, I will have some more time to come back to this.

@sgibson91 sgibson91 self-assigned this Nov 8, 2021
@sgibson91
Copy link
Member Author

sgibson91 commented Nov 8, 2021

I have deployed the cluster and hub again. In the namespace configconnector-operator-system there is a pod called configconnector-operator-0. I have pasted the logs from the pod below. They were quite long so I've also included the logs after they have been piped through grep serviceaccount. They also exceed GitHub's comment character limit 😱 so I will add them in separate comments. The full log output by itself was still too long for GitHub's comment character limit 😱

@sgibson91
Copy link
Member Author

Log output grepped for `serviceaccount`
{"severity":"error","msg":"Unable to get resource","name":"cnrm-deletiondefender","error":"unable to get mapping for resource: serviceaccounts \"cnrm-deletiondefender\" not found"}
{"severity":"error","msg":"Unable to get resource","name":"cnrm-resource-stats-recorder","error":"unable to get mapping for resource: serviceaccounts \"cnrm-resource-stats-recorder\" not found"}
{"severity":"error","msg":"Unable to get resource","name":"cnrm-webhook-manager","error":"unable to get mapping for resource: serviceaccounts \"cnrm-webhook-manager\" not found"}
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender created
serviceaccount/cnrm-resource-stats-recorder created
serviceaccount/cnrm-webhook-manager created
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged
customresourcedefinition.apiextensions.k8s.io/iamserviceaccountkeys.iam.cnrm.cloud.google.com configured
customresourcedefinition.apiextensions.k8s.io/iamserviceaccounts.iam.cnrm.cloud.google.com configured
serviceaccount/cnrm-deletiondefender unchanged
serviceaccount/cnrm-resource-stats-recorder unchanged
serviceaccount/cnrm-webhook-manager unchanged

@sgibson91 sgibson91 removed their assignment Nov 24, 2021
@sgibson91
Copy link
Member Author

sgibson91 commented Nov 24, 2021

Other things to think about to get this working on Pangeo cluster:

  • We should move the creation of buckets out of Terraform and into config_connector so we get a scratch bucket per scratch-bucket-enabled hub
  • Switch naming convention of scratch buckets to match the form ${gcp_project_id}-${helm_release_name}-scratch-bucket as is expected here:
    bucket_name = f'{project_id}-{release}-scratch-bucket'
  • The service accounts also need the roles/storage.object.creator permission in order to write to the bucket

@choldgraf
Copy link
Member

Update: plan to use Terraform

In a team meeting, we discussed our options here and decided that using Terraform is the right way to go rather than the GCP "ConfigConnector" service, because this will be easier to do quickly.

@yuvipanda yuvipanda self-assigned this Mar 18, 2022
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Mar 18, 2022
The GKE config connector was helpful in letting us deploy
Google Cloud Service Accounts with permissions for cloud storage
directly just from helm. However, it has been difficult to debug,
and in 2i2c-org#669
we decided to move away from it and towards creating these
cloud resources via Terraform.

This commit adds:
- Terraform code that will create a Google Service Account,
  bind it to a given Kubernetes Service Account, for a list of
  hub namespaces passed in. This means that some hub initial deployments
  now *can not be done just with CD*, but need manual work with
  terraform. I think this would be any hub that wants to use
  requestor pays or scratch buckets. This would need to be
  documented.
- Move meom-ige to use this new scheme. metadata concealment
  (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment)
  which is what we were using earlier as alternative to config-connector
  + workload identity, is no longer supported by the terraform
  google provider. In b7b42ce,
  we changed the default from 'SECURE' to 'UNSPECIFIED', but
  it looks like 'UNSPECIFIED' really means 'use workload identity'
  haha. When 2i2c-org#1124 was
  deployed to meom-ige yesterday, it seems to have enabled workload
  identity, causing cloud access to stop working, leading to
  https://2i2c.freshdesk.com/a/tickets/107. Further investigation on
  what happened here is needed, but I've currently fixed it by
  just deploying this change for meom-ige.
- All hubs are given access to all buckets we create. This is
  inadequete, and needs to be more fine grained.

Ref 2i2c-org#669
Ref 2i2c-org#1046
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Mar 29, 2022
The GKE config connector was helpful in letting us deploy
Google Cloud Service Accounts with permissions for cloud storage
directly just from helm. However, it has been difficult to debug,
and in 2i2c-org#669
we decided to move away from it and towards creating these
cloud resources via Terraform.

This commit adds:
- Terraform code that will create a Google Service Account,
  bind it to a given Kubernetes Service Account, for a list of
  hub namespaces passed in. This means that some hub initial deployments
  now *can not be done just with CD*, but need manual work with
  terraform. I think this would be any hub that wants to use
  requestor pays or scratch buckets. This would need to be
  documented.
- Move meom-ige to use this new scheme. metadata concealment
  (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment)
  which is what we were using earlier as alternative to config-connector
  + workload identity, is no longer supported by the terraform
  google provider. In b7b42ce,
  we changed the default from 'SECURE' to 'UNSPECIFIED', but
  it looks like 'UNSPECIFIED' really means 'use workload identity'
  haha. When 2i2c-org#1124 was
  deployed to meom-ige yesterday, it seems to have enabled workload
  identity, causing cloud access to stop working, leading to
  https://2i2c.freshdesk.com/a/tickets/107. Further investigation on
  what happened here is needed, but I've currently fixed it by
  just deploying this change for meom-ige.
- All hubs are given access to all buckets we create. This is
  inadequete, and needs to be more fine grained.

Ref 2i2c-org#669
Ref 2i2c-org#1046
@yuvipanda
Copy link
Member

This is done except for #1153! \o/ Thank you for blazing the trail here, @sgibson91!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants