-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to using pd-balanced for all user & dask nodes #1124
Conversation
�[0m�[32mSwitched to workspace "pilot-hubs".�[0m google_artifact_registry_repository.registry: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1/repositories/pilot-hubs-registry] google_service_account.cluster_sa: Refreshing state... [id=projects/two-eye-two-see/serviceAccounts/[email protected]] google_service_account.cd_sa: Refreshing state... [id=projects/two-eye-two-see/serviceAccounts/[email protected]] google_project_iam_custom_role.identify_project_role: Refreshing state... [id=projects/two-eye-two-see/roles/pilot_hubs_user_sa_role] google_project_iam_member.cd_sa_roles["roles/artifactregistry.writer"]: Refreshing state... [id=two-eye-two-see/roles/artifactregistry.writer/serviceAccount:[email protected]] google_project_iam_member.cd_sa_roles["roles/container.admin"]: Refreshing state... [id=two-eye-two-see/roles/container.admin/serviceaccount:[email protected]] google_project_iam_member.cluster_sa_roles["roles/logging.logWriter"]: Refreshing state... [id=two-eye-two-see/roles/logging.logWriter/serviceaccount:[email protected]] google_project_iam_member.cluster_sa_roles["roles/monitoring.metricWriter"]: Refreshing state... [id=two-eye-two-see/roles/monitoring.metricWriter/serviceAccount:[email protected]] google_service_account_key.cd_sa: Refreshing state... [id=projects/two-eye-two-see/serviceAccounts/[email protected]/keys/ee245c75f6c3fa5a36ae59597b356f4c6da80334] google_project_iam_member.cluster_sa_roles["roles/stackdriver.resourceMetadata.writer"]: Refreshing state... [id=two-eye-two-see/roles/stackdriver.resourceMetadata.writer/serviceAccount:[email protected]] google_project_iam_member.cluster_sa_roles["roles/artifactregistry.reader"]: Refreshing state... [id=two-eye-two-see/roles/artifactregistry.reader/serviceAccount:[email protected]] google_project_iam_member.cluster_sa_roles["roles/monitoring.viewer"]: Refreshing state... [id=two-eye-two-see/roles/monitoring.viewer/serviceAccount:[email protected]] google_project_iam_member.identify_project_binding: Refreshing state... [id=two-eye-two-see/projects/two-eye-two-see/roles/pilot_hubs_user_sa_role/serviceAccount:[email protected]] google_container_cluster.cluster: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster] google_container_node_pool.notebook["user"]: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster/nodePools/nb-user] google_container_node_pool.notebook["paleo"]: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster/nodePools/nb-paleo] google_container_node_pool.dask_worker["worker"]: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster/nodePools/dask-worker] google_container_node_pool.core: Refreshing state... [id=projects/two-eye-two-see/locations/us-central1-b/clusters/pilot-hubs-cluster/nodePools/core-pool] |
�[0m�[1mgoogle_artifact_registry_repository.registry: Refreshing state... [id=projects/cb-1003-1696/locations/us-central1/repositories/cb-registry]�[0m �[0m�[1mgoogle_service_account.cd_sa: Refreshing state... [id=projects/cb-1003-1696/serviceAccounts/[email protected]]�[0m �[0m�[1mgoogle_service_account.cluster_sa: Refreshing state... [id=projects/cb-1003-1696/serviceAccounts/[email protected]]�[0m �[0m�[1mgoogle_project_iam_custom_role.identify_project_role: Refreshing state... [id=projects/cb-1003-1696/roles/cb_user_sa_role]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/artifactregistry.writer"]: Refreshing state... [id=cb-1003-1696/roles/artifactregistry.writer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/container.admin"]: Refreshing state... [id=cb-1003-1696/roles/container.admin/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_service_account_key.cd_sa: Refreshing state... [id=projects/cb-1003-1696/serviceAccounts/[email protected]/keys/fc50105d4d954fbe97e4453d0988df9dc88a762a]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/logging.logWriter"]: Refreshing state... [id=cb-1003-1696/roles/logging.logWriter/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/artifactregistry.reader"]: Refreshing state... [id=cb-1003-1696/roles/artifactregistry.reader/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.metricWriter"]: Refreshing state... [id=cb-1003-1696/roles/monitoring.metricWriter/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.viewer"]: Refreshing state... [id=cb-1003-1696/roles/monitoring.viewer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/stackdriver.resourceMetadata.writer"]: Refreshing state... [id=cb-1003-1696/roles/stackdriver.resourceMetadata.writer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.identify_project_binding: Refreshing state... [id=cb-1003-1696/projects/cb-1003-1696/roles/cb_user_sa_role/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_container_cluster.cluster: Refreshing state... [id=projects/cb-1003-1696/locations/us-central1-b/clusters/cb-cluster]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["worker"]: Refreshing state... [id=projects/cb-1003-1696/locations/us-central1-b/clusters/cb-cluster/nodePools/dask-worker]�[0m �[0m�[1mgoogle_container_node_pool.notebook["user"]: Refreshing state... [id=projects/cb-1003-1696/locations/us-central1-b/clusters/cb-cluster/nodePools/nb-user]�[0m �[0m�[1mgoogle_container_node_pool.core: Refreshing state... [id=projects/cb-1003-1696/locations/us-central1-b/clusters/cb-cluster/nodePools/core-pool]�[0m �[0m �[1m�[36mNote:�[0m�[1m Objects have changed outside of Terraform�[0m |
�[0m�[1mgoogle_service_account.cd_sa: Refreshing state... [id=projects/meom-ige-cnrs/serviceAccounts/[email protected]]�[0m �[0m�[1mgoogle_project_iam_custom_role.identify_project_role: Refreshing state... [id=projects/meom-ige-cnrs/roles/meom_ige_user_sa_role]�[0m �[0m�[1mgoogle_service_account.cluster_sa: Refreshing state... [id=projects/meom-ige-cnrs/serviceAccounts/[email protected]]�[0m �[0m�[1mgoogle_storage_bucket.user_buckets["data"]: Refreshing state... [id=meom-ige-data]�[0m �[0m�[1mgoogle_storage_bucket.user_buckets["scratch"]: Refreshing state... [id=meom-ige-scratch]�[0m �[0m�[1mgoogle_artifact_registry_repository.registry: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1/repositories/meom-ige-registry]�[0m �[0m�[1mgoogle_service_account_key.cd_sa: Refreshing state... [id=projects/meom-ige-cnrs/serviceAccounts/[email protected]/keys/6fb3ca469f000201aa0e2ea24ff02515324b1ffe]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/artifactregistry.writer"]: Refreshing state... [id=meom-ige-cnrs/roles/artifactregistry.writer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/container.admin"]: Refreshing state... [id=meom-ige-cnrs/roles/container.admin/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_storage_bucket_iam_member.member["data"]: Refreshing state... [id=b/meom-ige-data/roles/storage.admin/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/stackdriver.resourceMetadata.writer"]: Refreshing state... [id=meom-ige-cnrs/roles/stackdriver.resourceMetadata.writer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_storage_bucket_iam_member.member["scratch"]: Refreshing state... [id=b/meom-ige-scratch/roles/storage.admin/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/artifactregistry.reader"]: Refreshing state... [id=meom-ige-cnrs/roles/artifactregistry.reader/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.metricWriter"]: Refreshing state... [id=meom-ige-cnrs/roles/monitoring.metricWriter/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.viewer"]: Refreshing state... [id=meom-ige-cnrs/roles/monitoring.viewer/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_container_cluster.cluster: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/logging.logWriter"]: Refreshing state... [id=meom-ige-cnrs/roles/logging.logWriter/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_project_iam_member.identify_project_binding: Refreshing state... [id=meom-ige-cnrs/projects/meom-ige-cnrs/roles/meom_ige_user_sa_role/serviceaccount:[email protected]]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["small"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/dask-small]�[0m �[0m�[1mgoogle_container_node_pool.notebook["huge"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/nb-huge]�[0m �[0m�[1mgoogle_container_node_pool.core: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/core-pool]�[0m �[0m�[1mgoogle_container_node_pool.notebook["small"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/nb-small]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["very-large"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/dask-very-large]�[0m �[0m�[1mgoogle_container_node_pool.notebook["large"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/nb-large]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["medium"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/dask-medium]�[0m �[0m�[1mgoogle_container_node_pool.notebook["medium"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/nb-medium]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["huge"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/dask-huge]�[0m �[0m�[1mgoogle_container_node_pool.notebook["very-large"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/nb-very-large]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["large"]: Refreshing state... [id=projects/meom-ige-cnrs/locations/us-central1-b/clusters/meom-ige-cluster/nodePools/dask-large]�[0m �[0m �[1m�[36mNote:�[0m�[1m Objects have changed outside of Terraform�[0m |
�[0m�[1mgoogle_artifact_registry_repository.registry: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1/repositories/pangeo-hubs-registry]�[0m �[0m�[1mgoogle_project_iam_custom_role.identify_project_role: Refreshing state... [id=projects/pangeo-integration-te-3eea/roles/pangeo_hubs_user_sa_role]�[0m �[0m�[1mgoogle_service_account.cd_sa: Refreshing state... [id=projects/pangeo-integration-te-3eea/serviceAccounts/pangeo-hubs-cd-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_service_account.cluster_sa: Refreshing state... [id=projects/pangeo-integration-te-3eea/serviceAccounts/pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_storage_bucket.user_buckets["pangeo-scratch"]: Refreshing state... [id=pangeo-hubs-pangeo-scratch]�[0m �[0m�[1mgoogle_service_account_key.cd_sa: Refreshing state... [id=projects/pangeo-integration-te-3eea/serviceAccounts/pangeo-hubs-cd-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com/keys/a49552f97c52549eff834dd54e80865715c0b953]�[0m �[0m�[1mgoogle_storage_bucket_iam_member.member["pangeo-scratch"]: Refreshing state... [id=b/pangeo-hubs-pangeo-scratch/roles/storage.admin/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/container.admin"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/container.admin/serviceAccount:pangeo-hubs-cd-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cd_sa_roles["roles/artifactregistry.writer"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/artifactregistry.writer/serviceAccount:pangeo-hubs-cd-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.viewer"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/monitoring.viewer/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/stackdriver.resourceMetadata.writer"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/stackdriver.resourceMetadata.writer/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/artifactregistry.reader"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/artifactregistry.reader/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/logging.logWriter"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/logging.logWriter/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_project_iam_member.cluster_sa_roles["roles/monitoring.metricWriter"]: Refreshing state... [id=pangeo-integration-te-3eea/roles/monitoring.metricWriter/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_filestore_instance.homedirs[0]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/instances/pangeo-hubs-homedirs]�[0m �[0m�[1mgoogle_compute_router.router[0]: Refreshing state... [id=projects/pangeo-integration-te-3eea/regions/us-central1/routers/pangeo-hubs-router]�[0m �[0m�[1mgoogle_compute_firewall.iap_ssh_ingress[0]: Refreshing state... [id=projects/pangeo-integration-te-3eea/global/firewalls/allow-ssh]�[0m �[0m�[1mgoogle_project_iam_member.identify_project_binding: Refreshing state... [id=pangeo-integration-te-3eea/projects/pangeo-integration-te-3eea/roles/pangeo_hubs_user_sa_role/serviceAccount:pangeo-hubs-cluster-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com]�[0m �[0m�[1mgoogle_container_cluster.cluster: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster]�[0m �[0m�[1mgoogle_compute_router_nat.nat[0]: Refreshing state... [id=pangeo-integration-te-3eea/us-central1/pangeo-hubs-router/pangeo-hubs-router-nat]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["large"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-large]�[0m �[0m�[1mgoogle_container_node_pool.notebook["small"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/nb-small]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["small"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-small]�[0m �[0m�[1mgoogle_container_node_pool.notebook["medium"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/nb-medium]�[0m �[0m�[1mgoogle_container_node_pool.notebook["large"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/nb-large]�[0m �[0m�[1mgoogle_container_node_pool.dask_worker["medium"]: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/dask-medium]�[0m �[0m�[1mgoogle_container_node_pool.core: Refreshing state... [id=projects/pangeo-integration-te-3eea/locations/us-central1-b/clusters/pangeo-hubs-cluster/nodePools/core-pool]�[0m �[0m �[1m�[36mNote:�[0m�[1m Objects have changed outside of Terraform�[0m |
This is a destructive-ish change, so I'll have to apply this carefully and well timed. |
cc @mmcky - we're playing around with speeding up the node creation a bit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM - good luck with the apply!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤞🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me - I'll defer to you all on the technical strategy here.
My one question is whether we can document the rationale for these decisions somewhere. For example, why pd-balanced
for some and pd-ssd
for others? Normally I would just say this can be a one-line comment above the config, but since this is spread out across many clusters I feel like this would be cumbersome to change.
Can anybody think of a quick way to document this? If not, I don't wanna block the PR on this, I just worry that we have lots of "implicit strategy" encoded in some of these infrastructure decisions that might be hard to reason with in the future without the context
@choldgraf i had forgotten to push a commit - I have just made the decision for everything to just use |
We were using standard disk to save costs, but that brings with it much slower node startup time, as images being pulled take time. pd-balanced is a newer alternative to pure SSD disks that is not as expensive, but provides much better performance than pd-standard. I think the extra cost is worth the performance on all these cases.
I've taken advantage of a lull in usage to deploy this to the pilot-hubs and meom-ige cluster. |
Need to deploy this to cloudbank and pangeo-hubs still. Both have users right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks Yuvi! And @mmcky it sounds like these changes have already been deployed to the cluster that you're using, so maybe keep an eye on any reports people make about slower starts and see if we are a bit faster than the 6-8 minute waits you were reporting before!
}, | ||
"medium" : { | ||
min : 0, | ||
max : 20, | ||
machine_type : "n1-standard-8", | ||
labels: {} | ||
labels: {}, | ||
disk_type: "pd-ssd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanna confirm that you're intentionally making these pd-ssd
, since you mentioned in another comment that you had missed some of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @choldgraf! Fixed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @choldgraf and @yuvipanda -- will report back from the user base. Really appreciate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmcky thanks for helping us make the service better for everyone!
This was a no-op anyway, as we had hardcoded the disk types
The GKE config connector was helpful in letting us deploy Google Cloud Service Accounts with permissions for cloud storage directly just from helm. However, it has been difficult to debug, and in 2i2c-org#669 we decided to move away from it and towards creating these cloud resources via Terraform. This commit adds: - Terraform code that will create a Google Service Account, bind it to a given Kubernetes Service Account, for a list of hub namespaces passed in. This means that some hub initial deployments now *can not be done just with CD*, but need manual work with terraform. I think this would be any hub that wants to use requestor pays or scratch buckets. This would need to be documented. - Move meom-ige to use this new scheme. metadata concealment (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment) which is what we were using earlier as alternative to config-connector + workload identity, is no longer supported by the terraform google provider. In b7b42ce, we changed the default from 'SECURE' to 'UNSPECIFIED', but it looks like 'UNSPECIFIED' really means 'use workload identity' haha. When 2i2c-org#1124 was deployed to meom-ige yesterday, it seems to have enabled workload identity, causing cloud access to stop working, leading to https://2i2c.freshdesk.com/a/tickets/107. Further investigation on what happened here is needed, but I've currently fixed it by just deploying this change for meom-ige. - All hubs are given access to all buckets we create. This is inadequete, and needs to be more fine grained. Ref 2i2c-org#669 Ref 2i2c-org#1046
This isn't deployed on pangeo-hubs and cloudbank, but am going to merge this now to make further PR work that touches this easier. |
@yuvipanda not sure if this is related to this merge. But for the Friday @ 3pm (AEST) Tutorial Lab for the ANU course the whole class went down and weren't able to use the anu jupyterhub. All the other tutorials have been fine. I am wondering if this time slot may have coincided with an upgrade (overnight US time). Just thought I'd feed that back. |
@mmcky oh no, I'm so sorry you ran into issues! I just investigated the hub and it is not related to this PR - #1135 has more information. It looks like you basically ran into #1103. This has happened twice now this week, so I'll bump up the priority in getting the fix deployed. Sorry for the inconvenience, and I'll let you know once we deploy the fix! |
Thanks so much @yuvipanda love your work. |
@yuvipanda, is this follow-up task living in a new issue? |
@damianavila just opened up #1153 |
The GKE config connector was helpful in letting us deploy Google Cloud Service Accounts with permissions for cloud storage directly just from helm. However, it has been difficult to debug, and in 2i2c-org#669 we decided to move away from it and towards creating these cloud resources via Terraform. This commit adds: - Terraform code that will create a Google Service Account, bind it to a given Kubernetes Service Account, for a list of hub namespaces passed in. This means that some hub initial deployments now *can not be done just with CD*, but need manual work with terraform. I think this would be any hub that wants to use requestor pays or scratch buckets. This would need to be documented. - Move meom-ige to use this new scheme. metadata concealment (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment) which is what we were using earlier as alternative to config-connector + workload identity, is no longer supported by the terraform google provider. In b7b42ce, we changed the default from 'SECURE' to 'UNSPECIFIED', but it looks like 'UNSPECIFIED' really means 'use workload identity' haha. When 2i2c-org#1124 was deployed to meom-ige yesterday, it seems to have enabled workload identity, causing cloud access to stop working, leading to https://2i2c.freshdesk.com/a/tickets/107. Further investigation on what happened here is needed, but I've currently fixed it by just deploying this change for meom-ige. - All hubs are given access to all buckets we create. This is inadequete, and needs to be more fine grained. Ref 2i2c-org#669 Ref 2i2c-org#1046
We were using standard disk to save costs, but that brings with
it much slower node startup time, as images being pulled take time.
pd-balanced is a newer alternative to pure SSD disks that is
not as expensive, but provides much better performance than pd-standard.
I think the extra cost is worth the performance on all these
cases.
This was based on feedback on new node spinup performance in
#991 (comment)