-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2158] [Feature] Support Workload Identity Federation for Headless Authentication into BigQuery #593
Comments
As a very good workaround to this; it is recommended that you UPLOAD service account keys to GCP and dbt Cloud, and aggressively rotate them |
Thanks for opening this and also supplying a "very good workaround" @ernestoongaro ! What would you imagine the my-snowflake-db:
target: dev
outputs:
dev:
type: bigquery
method: azure-ad
# Identity federation for Azure AD auth
some_key: [some_value]
some_other_key: [some_other_value]
... |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
This is being tracked internally at dbt Labs (this link is behind a login but can be used to reference in the future: https://dbt-labs.productboard.com/entity-detail/features/18692136) |
We want to have a delightful experience around warehouse authentication that is friction-free, so re-opening this public-facing issue. As @ernestoongaro mentioned, we have a separate internal ticket here that we've kept open as well since the most significant pieces of the implementation will take place within dbt Cloud. |
@dbeatty10 , Workload Identity Federation is not for end user connection, but for headless/service access to resources, so Azure AD wouldn't fit in this use case. Based on the comment you have above and some demonstration of how WIF works with GH Actions, the profile would most likely look more like: my-snowflake-db:
target: dev
outputs:
prod:
type: bigquery
method: workflow-identity-federation
# Identity federation info
workload_identity_provider: [some_value]
service_account: [name_of_the_service_account]
... On the GCP side, people would need to set a Workload Identity Pool and mention an OpenID Connect (OIDC) provider and its issuer URL. This means that we need to have a service in place that can communicate with GCP using the OIDC protocol. This would happen on the dbt Cloud side and would require some development there. On the |
The logic of of the GH action example from the video is available in the repo google-github-actions/auth in this page (in Typescript though). We can see what API calls are made. |
I really don't want to do this, but +1. We'd want to do something similar, so if it helps, register my interests to keep this issue open |
It's not really clear to me what value this adds. Is this intended for dbt-core users or dbt Cloud? If core, then you should be using workload identity via whatever worker is running dbt-core. We run dbt-core on kubernetes for example, and it uses workload identity and then has a federated identity it can use to authenticate to GCP as a GCP service account. dbt can run in the usual oauth mode once the worker has logged in, for example with If Cloud then the ask seems more like, adding the ability to trust dbt Cloud as a workload identity provider, and I'm not sure that's the scope defined in the ticket? |
Hi @mwstanleyft yes the intention is for getting it to work with dbt Cloud, there might be some changes required in Core. Thanks for the comment! |
I see - then yeah, a lot of the discussion above doesn't make a ton of sense :D For one thing, you wouldn't use profiles.yml if you're using dbt Cloud, would you? I think the discussion above about Azure AD, on-prem AD FS, and Okta etc, are a red herring. I also think this feature has nothing particular to do with BigQuery or GCP, and the support required from dbt Labs is broadly the same for GCP/BigQuery as it would be for AWS IAM, Azure, or anyone else who supports federating login to a third party identity provider. So for this request, you would want dbt Cloud itself to be trusted as an identity provider and provide an OIDC-compliant token issuer endpoint (similar to this one that GitHub provides) and its own identities. This would allow workloads in dbt Cloud to impersonate service accounts in your Cloud environment when they're working on dbt runs. I'm not sure what this would require for the Cloud IDE to function correctly (because you would want production runs to use a separate service account to developers, who should probably be logged in via OAuth as normal), but simplistically the solution would probably be to have the connection manager in dbt Cloud support OIDC as an option - the user would need to set up their cloud environment properly and then provide the pool provider details and the service account name, and then dbt Cloud will be able to impersonate that service account with GCP on its own authority as the token issuer. Seems like a substantial amount of effort on behalf of dbt Labs to deliver this! |
And yeah I should reiterate that this isn't necessary at all for dbt-core since the end user is in control of the compute where dbt-core runs and can decide how they want to federate identities to that workload. It only matters for jobs running in dbt Cloud, where dbt Labs owns the worker running the dbt command. |
I think the issue in self hosting re; GCP Workload Identity, at least in what I saw, lies in specifically authentication to BigQuery? Based on the docs here - https://docs.dagster.io/integrations/bigquery/using-bigquery-with-dagster#prerequisites It's been a while since I set up my Dagster instances in GKE but I believe the shortcoming to using Workload Identity when not running on Dagster Cloud was related to BQ (again, not sure if this is a Dagster or BQ limitation, I can provide some more info if it's helpful) |
I have no shortcomings when using dbt-core in GKE and BQ and using Workload Identity. Works great. We also use Dagster but we self-host our agent in GKE along with all the workers for the jobs. You don't even need OIDC for that since it's GCP-native. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
I think you would probably be better off treating this as an adapter-agnostic feature request along the lines of "Set up dbt Cloud as an OIDC Identity Provider, create an OIDC-compliant token issuer endpoint and support workload identity federation in dbt Cloud". This is really a pure dbt Cloud feature. There will be some specifics for each database you're going to use workload identity with (fields you will need to add / change on the connection setup screen in dbt Cloud) but the issuer endpoint and related features can be shared and aren't BigQuery specific. All the major cloud computing providers have some type of support for OIDC workload identity federation with third-party IdPs. |
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Is this your first time submitting a feature request?
Describe the feature
Traditionally, applications running outside Google Cloud can use service account keys to access Google Cloud resources. However, service account keys are powerful credentials, and can present a security risk if they are not managed correctly.
With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys.
Describe alternatives you've considered
Oauth is fine for developer authentication, but not great for something that will be scheduling the runs (like dbt Cloud)
Who will this benefit?
Any security-conscious GCP users
Are you interested in contributing this feature?
No response
Anything else?
Specifically this request is for use with Azure AD (which is OIDC compliant) but there are other schemes supported:
The text was updated successfully, but these errors were encountered: