Skip to content

Infer default credential providers #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kylebarron opened this issue Apr 7, 2025 · 5 comments
Open

Infer default credential providers #414

kylebarron opened this issue Apr 7, 2025 · 5 comments

Comments

@kylebarron
Copy link
Member

I'm thinking something like this in store.py:

(Not sure if all these exist credential providers exist in obstore, but assume that when called with no arguments that these return the vendor's default credential.)

def from_url(...):
    ...
    scheme = _parse_scheme(url)
    vendor_default_credential_providers = {
        "s3": S3CredentialProvider,
        "gcs": GoogleCredentialProvider,
        "azure": AzureCredentialProvider,
    }

    vendor_default_async_credential_providers = {
        "s3": S3AsyncCredentialProvider,
        "gcs": GoogleAsyncCredentialProvider,
        "azure": AzureAsyncCredentialProvider,
    }

    if credential_provider is None and vendor_default_credential_provider:
        if credential_provider := vendor_default_credential_providers.get(scheme):
            credential_provider = credential_provider()
    elif credential_provider is None and vendor_default_async_credential_provider:
        if credential_provider := vendor_default_async_credential_providers.get(scheme):
            credential_provider = credential_provider()

    if scheme == "s3":
        ...

Implications:

  • Existing behaviour of from_url is unchanged when new keywords are omitted.
  • Two new keywords added to from_url: vendor_default_credential_provider and vendor_default_async_credential_provider. These allow the user to opt into using an automatically selected vendor default credential provider, and allow the user to specify whether they want it to be synchronous or asynchronous.
  • If one of the new keywords is specified, and credential_provider is not explicitly specified, then the appropriate vendor default credential provider will be used.
  • There is a one to one mapping from scheme to default vendor credential providers. Users must still explicitly specify if they need credential providers other than the vendor default credential provider.

Originally posted by @daviewales in #267

@kylebarron
Copy link
Member Author

I'm still not sure this is a good idea. I'd prefer to nudge people towards the Rust-native authentication if that meets their needs.

And these APIs look and feel very hacky.

@daviewales
Copy link
Contributor

I agree it would be preferable to re-use the rust-native authentication if possible. I've updated my request to enable Azure CLI fallback upstream, as this would solve my immediate use case.

Regarding the auto-selection of vendor credential providers based on URL scheme, how much of the hacky feel is due to my sample code, which I admit is indeed hacky, and how much is 'essential hackiness', which could not be avoided even if the general idea was carefully rewritten?

For example, I could imagine reducing the hackiness by just passing through the vendor_default_credential_provider argument from from_url to the individual stores, and handling it there:

def from_url(
    url: str,
    *,
    config: S3Config | GCSConfig | AzureConfig | None = None,
    client_options: ClientConfig | None = None,
    retry_config: RetryConfig | None = None,
    credential_provider: S3CredentialProvider
    | GCSCredentialProvider
    | AzureCredentialProvider
    | None = None,
    vendor_default_credential_provider=False, # <-- Add option to use vendor default credential providers
    **kwargs: Any,
) -> ObjectStore:
    ...
    scheme = _parse_scheme(url)
    if scheme == "s3":
        return S3Store.from_url(
            url,
            config=config,
            client_options=client_options,
            retry_config=retry_config,
            credential_provider=credential_provider,
            vendor_default_credential_provider=vendor_default_credential_provider, # <-- pass through to stores
            **kwargs,
        )
    ...

@kylebarron
Copy link
Member Author

I think a better way to solve this is with something like an object store registry https://docs.rs/datafusion/latest/datafusion/datasource/object_store/trait.ObjectStoreRegistry.html

So that users can register ways to handle different protocols. So maybe for generic s3 urls, users would pass in the boto3 credential provider, but for a specific s3 bucket override with known authentication.

@daviewales
Copy link
Contributor

Would obstore provide a default / pre-configured object store registry, and allow users to override/extend as required?

@kylebarron
Copy link
Member Author

kylebarron commented Apr 16, 2025

There probably wouldn't be a default registry, or at least not with these credential providers, because we need them to remain optional dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants