Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to enable Requester Pays access to Pangeo data for GCP hubs #662

Merged
merged 14 commits into from
Sep 11, 2021

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Sep 7, 2021

This PR adds documentation on the steps required to enable Requester Pays access to other Google Cloud storage buckets, such as the Pangeo data. These steps have already been implemented for the Pangeo staging hub.

fixes #655

@sgibson91 sgibson91 changed the title requestor-pays-access Document how to enable Requester Pays access to Pangeo data for GCP hubs Sep 7, 2021
@sgibson91 sgibson91 requested review from choldgraf and yuvipanda and removed request for choldgraf September 7, 2021 16:03
@sgibson91 sgibson91 self-assigned this Sep 7, 2021
@sgibson91 sgibson91 marked this pull request as ready for review September 7, 2021 16:04
Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick thoughts and suggestions from me. In general this looks quite nice, thanks for explaining all of this as you figured it out!

@@ -0,0 +1,103 @@
# Pangeo Data Access via Requester Pays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should make this generic instead of Pangeo-specific, since I could imagine other hubs would want the same setup?

@@ -0,0 +1,103 @@
# Pangeo Data Access via Requester Pays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use a top-level section like Data access and then this could be the first content in that section. I think that could bring this in-line with our Hub Administrator's guide "topic guides" here: https://pilot.2i2c.org/en/latest/


For some hubs, such as our Pangeo deployments, the communities they serve require access to data stored in other projects.
Accessing data normally comes with a charge that the folks _hosting_ the data have to take care of.
However, there is a method by which those making the request are responsible for the charges instead: [Requester Pays](https://cloud.google.com/storage/docs/requester-pays).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Requestor pays" access something unique to GCP? Or are we just prototyping this first on GCP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, it is unique to GCP. I've not come across it before being introduced to the Pangeo deployment.

## Setting up Requester Pays Access on GCP

```{note}
We may automate these steps in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely feels like something that is semi-common for research groups, and that is not scalable in its current implementation of manual steps :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. We should open an issue to figure out how to automate this as I suspect that it can't be confined to just the terraform scripts or just the deployer module, as for the last step you need a hub up and running. In practice though, this is a 5 min manual job.

docs/howto/configure/pangeo-requester-pays-access.md Outdated Show resolved Hide resolved
docs/howto/configure/pangeo-requester-pays-access.md Outdated Show resolved Hide resolved
@sgibson91
Copy link
Member Author

sgibson91 commented Sep 8, 2021

@choldgraf I think I've addressed your comments, let me know what you think!

@yuvipanda
Copy link
Member

Thanks for fixing this one, @sgibson91. I had already automated this with config connector, creating the appropriate bindings in https://github.com/2i2c-org/pilot-hubs/blob/4d84506587142a36205b56dd3592e45aca649e21/hub-templates/basehub/templates/cloud-resources/gcp/service-account.yaml#L25. So this should have maybe already worked - the question for me is why did it not? Was there a feature we needed to turn on? Are these config connector resources being created appropriately?

@sgibson91
Copy link
Member Author

@yuvipanda Some random thoughts:

  • I wonder of the annotation (step 4 in the docs) wasn't happening?
  • Role storage.objectViewer also seems to be missing from that link
  • Object Viewer and Service Usage consumer should be granted on the project

Let's compare requestor-pays-sa and pangeo-hubs-cluster-sa here to debug

@sgibson91
Copy link
Member Author

Something is definitely broken in that config though as I had to manually set this up for COESSING too #473 (comment) which also had the config connector enabled https://github.com/2i2c-org/pilot-hubs/blob/4d84506587142a36205b56dd3592e45aca649e21/terraform/gcp/projects/pangeo-181919.tfvars#L10

@yuvipanda
Copy link
Member

@sgibson91
Copy link
Member Author

Hmmm, then we need to make that more discoverable/apparent that that's what it's doing. Even from that snippet, I don't think it's clear that that is also enabling requester pays access. Any recommendations?

@yuvipanda
Copy link
Member

@sgibson91 let's try and see if it works, and if it does, we'll have to find a way to document that. I haven't touched it since we set up the first 2i2c clusters. I didn't add any docs when I first implemented that - I'm really sorry about that :(

Let's try turning it on and see if it works?

@sgibson91
Copy link
Member Author

Sure - no worries!

@sgibson91
Copy link
Member Author

@yuvipanda the config connector stuff in the helm chart doesn't work as it creates an annotation that looks like this:

Annotations: iam.gke.io/gcp-service-account: staging-user-sa@pangeo-integration-te-3eea.iam.gserviceaccount.com

But that service account doesn't exist anywhere in the Google project - I don't think it's created by terraform (we only create *-cluster-sa and *-cd-sa).

I'll open a new issue to track fixing this up (I think it's doable if we set a fixed display name, like requester-pays-sa as I've suggested in the docs) and I'll reenable the manual setup for now.

@yuvipanda
Copy link
Member

@sgibson91 that service account should've been also created by configconnector with https://github.com/2i2c-org/pilot-hubs/blob/f6db9a19b4058e1080312a75e9d5798cb3989a60/hub-templates/basehub/templates/cloud-resources/gcp/service-account.yaml#L3. We can't create it in terraform since it's one per hub rather than one per deployment.

@sgibson91
Copy link
Member Author

sgibson91 commented Sep 9, 2021

I'll double check it tomorrow but I checked the Google project after running the helm chart with that config enabled and noticed the name of the service account and it wasn't there.

@sgibson91
Copy link
Member Author

I opened #669 to track this

@sgibson91
Copy link
Member Author

I've updated this PR so the manual steps reflect what we expect the automation to be doing. Can we get this merged and then concentrate on figuring out why the automation doesn't work in #669?

Copy link
Contributor

@damianavila damianavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this one merged and figure it out why the automation is not working as part of another ticket, as @sgibson91 proposed.

@sgibson91 sgibson91 merged commit 380a8bf into 2i2c-org:master Sep 11, 2021
@sgibson91 sgibson91 deleted the requestor-pays-access branch September 11, 2021 09:00
@choldgraf
Copy link
Member

thanks @sgibson91 for digging into this + the follow up issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pangeo Hub access to requester pays data
4 participants