Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(integrations/object_store): implement put_opts and get_opts #5513

Merged
merged 3 commits into from
Jan 13, 2025

Conversation

meteorgan
Copy link
Contributor

@meteorgan meteorgan commented Jan 6, 2025

Which issue does this PR close?

Part of: #5115.

Rationale for this change

What changes are included in this PR?

  1. implement put_opts and get_opts
  2. add integration tests for object_store_opendal

Are there any user-facing changes?

No

@meteorgan
Copy link
Contributor Author

meteorgan commented Jan 6, 2025

Hi, @Xuanwo, I have some issues to discuss:

  1. I believe we should add more tests for object_store_opendal. especially for the newly added APIs. However services::Memory doesn't support the required options. should I extend the services::Memory to include these options or should i just add some integration tests using Minio to test them ?

@meteorgan
Copy link
Contributor Author

  1. According to GetOptions, object_store returns different error codes when conditions aren't not met. For if_match, it returns Error::Precondition, and for if_none_match, it returns Error::NotModified. However in opendal, the returned Error doesn't allow us to distinguish between them.
    Do you have any advice ? should we add related information to opendal::Error to differentiate these cases ?

@meteorgan
Copy link
Contributor Author

  1. As FuturesBytesStream returns an io::Result<Bytes>, we can't the access actual Error returned by opendal. This is because the Error is triggered when calling StreamingReader::read(), not during the construction of the reader.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

  1. I believe we should add more tests for object_store_opendal. especially for the newly added APIs. However services::Memory doesn't support the required options. should I extend the services::Memory to include these options or should i just add some integration tests using Minio to test them ?

Ideally, we should test object_store_opendal using integration tests, similar to how we test the Python bindings. Initially, we can start by testing with fs and s3. However, since object_store has different semantics compared to opendal, implementing the tests might be somewhat challenging.

  1. Do you have any advice ? should we add related information to opendal::Error to differentiate these cases ?
  • For requests with if_match, OpenDAL returns ConditionNotMatch, which we can convert to Precondition for object_store.
  • For requests with if_none_match, OpenDAL returns ConditionNotMatch, which we can convert to NotModified for object_store. Note that for PUT requests with if_none_match("*"), we can also convert this to Precondition.
  • For requests with a combination of if_match and if_none_match, it doesn't make much difference (this request will prioritize checking if_match first). We can always convert it to Precondition.

The idea here is that we can always determine the meaning of this condition and map it to the corresponding error in object_store. Alternatively, it's also acceptable to simply return Precondition as a starting point.

  1. As FuturesBytesStream returns an io::Result<Bytes>, we can't the access actual Error returned by opendal. This is because the Error is triggered when calling StreamingReader::read(), not during the construction of the reader.

io::Error provides a downcast API which we can get our original error back: https://doc.rust-lang.org/stable/std/io/struct.Error.html#method.downcast

@meteorgan
Copy link
Contributor Author

Thank you for you advise !

Ideally, we should test object_store_opendal using integration tests, similar to how we test the Python bindings. Initially, we can start by testing with fs and s3. However, since object_store has different semantics compared to opendal, implementing the tests might be somewhat challenging.

I'm wondering whether we need to test object_store_opendal with every service(e.g., fs, s3, etc). Since we've already tested opendal with these services, we can believe they exhibit the same behavior within opendal. Therefore, we could test object_store_opendal with just one service and assume the other will work correctly as well.

  • For requests with if_match, OpenDAL returns ConditionNotMatch, which we can convert to Precondition for object_store.
  • For requests with if_none_match, OpenDAL returns ConditionNotMatch, which we can convert to NotModified for object_store. Note that for PUT requests with if_none_match("*"), we can also convert this to Precondition.
  • For requests with a combination of if_match and if_none_match, it doesn't make much difference (this request will prioritize checking if_match first). We can always convert it to Precondition.

The idea here is that we can always determine the meaning of this condition and map it to the corresponding error in object_store. Alternatively, it's also acceptable to simply return Precondition as a starting point.

Using this approach seems a bit trivial. What if a request includes a combination of if_match and if_modified_since ? for if_match, object_store requires PreCondition, whereas for if_modified_since, it requires NotModified

io::Error provides a downcast API which we can get our original error back: https://doc.rust-lang.org/stable/std/io/struct.Error.html#method.downcast

Our current MSRV is 1.75.0, but this feature became stable since 1.79.0. Should we continue using the current MSRV or is it okay to upgrade to the newer one?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Therefore, we could test object_store_opendal with just one service and assume the other will work correctly as well.

If everything goes smoothly, that's great. However, unexpected things can always happen, so it's better to have all services covered by integrate with our behavior test system.

Using this approach seems a bit trivial. What if a request includes a combination of if_match and if_modified_since ? for if_match, object_store requires PreCondition, whereas for if_modified_since, it requires NotModified

The entire composition could be complex, as demonstrated in azblob's matrix:

image

The expected error for if_match and if_modified_since in object_store isn't universally applicable across all services and combinations. In fact, different services might return varying errors for the same combination.

From my perspective, we don't need to impose restrictions here or aim for perfect alignment with object_store. We can revisit this if OpenDAL's own users feel that ConditionNotMatch is insufficient.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 7, 2025

Our current MSRV is 1.75.0, but this feature became stable since 1.79.0. Should we continue using the current MSRV or is it okay to upgrade to the newer one?

We can use io::Error::into_inner and then Error::downcast as an alternative. io::Error::downcast is just a quick wrapper.

@meteorgan
Copy link
Contributor Author

We can use io::Error::into_inner and then Error::downcast as an alternative. io::Error::downcast is just a quick wrapper.

Thank you, I have no further questions now. I'll move forward.

@meteorgan meteorgan force-pushed the object_store_options_api branch 2 times, most recently from ebd6d08 to fa78838 Compare January 10, 2025 15:33
@meteorgan meteorgan force-pushed the object_store_options_api branch from fa78838 to 3d5a5a5 Compare January 12, 2025 09:41
@meteorgan meteorgan marked this pull request as ready for review January 12, 2025 11:15
@meteorgan meteorgan requested a review from PsiACE as a code owner January 12, 2025 11:15
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @meteorgan for working on this!

@Xuanwo Xuanwo merged commit 03b78f1 into apache:main Jan 13, 2025
34 checks passed
@meteorgan meteorgan deleted the object_store_options_api branch January 13, 2025 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants