-
Notifications
You must be signed in to change notification settings - Fork 924
Remove AsyncFileReader::get_metadata_with_options
, add options
to AsyncFileReader::get_metadata
#7342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove AsyncFileReader::get_metadata_with_options
, add options
to AsyncFileReader::get_metadata
#7342
Conversation
AsyncFileReader::get_metadata_with_options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @corwinjoy -- this is a good improvement for sure
Since we can change the API in the next breaking release, what do you think about simply removing get_metadata
entirely?
That would mean that all downstream crates would be forced to handle or explicitly ignore the options, which I think would be good.
I am talking about removing this:
/// Provides asynchronous access to the [`ParquetMetaData`] of a parquet file,
/// allowing fine-grained control over how metadata is sourced, in particular allowing
/// for caching, pre-fetching, catalog metadata, etc...
fn get_metadata(&mut self) -> BoxFuture<'_, Result<Arc<ParquetMetaData>>>;
For anyone else following along |
You're suggesting removing fn get_metadata(&mut self, options: &ArrowReaderOptions) -> BoxFuture<'_, Result<Arc<ParquetMetaData>>>; ? |
I think changing the signature would also be a good choice |
Looking through how this function is used and implemented, many times users just want to grab metadata and don't care about options. Furthermore, right now we have a lot of code duplication between implementations of
I think this would make any applicable options explicit while streamlining the existing code. |
OK. I have updated this PR to change the signature of
as discussed. I think this reduces code duplication and ambiguity. |
AsyncFileReader::get_metadata_with_options
AsyncFileReader::get_metadata_with_options
, add options
to AsyncFileReader::get_metadata
I updated this PR's title to reflect what I think it currently does It looks like this PR has some merge conflicts, but once those are solved it will be good to go from my perspective |
…_with_options # Conflicts: # parquet/src/arrow/async_reader/store.rs # parquet/tests/arrow_reader/encryption_async.rs
Thank you! I have gone ahead and merged the latest from main to resolve these conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this has turned out. Thanks @corwinjoy!
Thanks everyone! |
I was just updating some of my code (parquet-wasm) to use the release candidate, but it doesn't seem possible for third party code to use the The internal implementation here on
But third party code can't do that. Is that intentional? |
It was not intentional to my knowledge. -- maybe we can add some accessors to ArrowReaderOptions? |
Rationale for this change
In using encryption with datafusion, I found the default implementation for
get_metadata_with_options
to be more of a hindrance than a help. What happened, is that when working with encrypted parquet, my calls would mysteriously fail and I could not understand why. It took me longer than I like to admit to discover that the default implementation forget_metadata_with_options
would quietly drop decryption information, and that was the source of my mysterious file reading failures. I propose that this default implementation should be removed so that developers get a clear compiler message that they should be explicit about what they want. If this is too much of a breaking API change, then a second best solution would be to provide a runtime error. Either way, I feel that the default implementation is simply too likely to cause unexpected and surprising errors.What changes are included in this PR?
Remove the default implementation for
get_metadata_with_options
for theAsyncFileReader: Send
trait.Are there any user-facing changes?
Users will need to be explicit about what they want to do for
get_metadata_with_options
for theAsyncFileReader: Send
trait. Having no default implementation is in line with other methods here. I feel that asking users to be explicit is a good change here to prevent potential errors.