Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FSS.S3 support when bucket is not provided #682

Merged
merged 4 commits into from
Aug 23, 2023

Conversation

philss
Copy link
Member

@philss philss commented Aug 19, 2023

This makes Explorer treat the "endpoint" as already containing the bucket name, and force the virtual host style in order to make Polars ignore the bucket name.

Since FSS is always building the correct endpoint in case the user does not provide, we changed the endpoint to be required in the Rust struct.

This work is related to elixir-explorer/fss#2

@philss philss requested a review from josevalim August 19, 2023 05:16
hash =
:crypto.hash(:sha256, entry.bucket <> "/" <> entry.key) |> Base.url_encode64(padding: false)
:crypto.hash(:sha256, bucket <> "/" <> entry.key) |> Base.url_encode64(padding: false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would encode the endpoint in the hash too.

uri =
if endpoint = entry.config.endpoint do
URI.parse(endpoint)
if not is_nil(config.bucket) and not String.contains?(config.endpoint, config.bucket) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert the if and else branches to simplify the condition?

@josevalim
Copy link
Member

josevalim commented Aug 19, 2023

Hi @philss, unfortunately I am worried of false positives. Imagine you have a bucket named "co" or "zon". Those will both be part of the URL. A better check would be to get the endpoint, split on "." and "/", and then check if it is present, but even that will have a false positive on "com".

I would prefer to be explicit. Could we do this instead:

  1. If a bucket is given (not nil), assume it is a path-style route, and not in the endpoint
  2. If a bucket is not given (nil), assume it is virtual host, and it must be in the endpoint

Then, we don't have to guess. Kino will always return virtual style, with no bucket. When parsing "s3://" style URLs, we can do either "s3://bucket/foobar" (the bucket is not in the endpoint) and "s3:///foobar" (virtual host style, the bucket will be in the endpoint).

This should support all cases with no ambiguity. Would this work?

This makes Explorer treat the "endpoint" as already containing the bucket
name, and force the virtual host style in order to make Polars ignore the
bucket name.

Since FSS is always building the correct endpoint in case the user does not
provide, we changed the endpoint to be required in the Rust struct.

This work is related to elixir-explorer/fss#2
It addresses José's suggestions to simplify the endpoint.
@philss philss force-pushed the ps-handle-s3-entries-without-bucket-name branch from ce54bb2 to e1eb8d6 Compare August 23, 2023 15:59
@philss philss marked this pull request as ready for review August 23, 2023 16:06
@philss philss merged commit 104e197 into main Aug 23, 2023
4 checks passed
@philss philss deleted the ps-handle-s3-entries-without-bucket-name branch August 23, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants