Description
Is your feature request related to a problem or challenge?
In apache/arrow-rs-object-store#17 I've described a general need for the ObjectStore
trait to be able to support passing contextual data to custom implementations. In apache/arrow-rs#7160 I have implemented and approach to this by providing the ability for GetOptions
to store opaque instances of values indexed by their TypeId
, similar to what is possible in datafusion with SessionConfig
.
This issue is about taking incorporating this new behavior(s) in ObjectStore
and incorporating it here in datafusion such that the custom data on a SessionConfig
is passed on when creating GetOptions
s instances for retrieving files from an object store.
Describe the solution you'd like
I think the simplest approach here would be one where we create a new ObjectStore
implementation during query execution that looks something like:
struct ContextualizedObjectStore {
inner: Arc<dyn ObjectStore>,
extensions: object_store::Extensions,
}
We would then have a get_opts
method for the ObjectStore
impl trait that looks something like:
async fn get_opts(
&self,
location: &Path,
mut options: GetOptions,
) -> object_store::Result<GetResult> {
options.extensions = self.extensions.clone();
self.inner.get_opts(location, options).await
}
Initializing instances of this new type as a wrapper around whatever given Arc<dyn ObjectStore>
is available would look something like:
let object_store = context
.runtime_env()
.object_store(&self.object_store_url)
.map(|inner: Arc<dyn ObjectStore>| -> Arc<dyn ObjectStore> {
Arc::new(ContextualizedObjectStore::new(
inner,
context.session_config().clone_extensions_for_object_store(),
))
});
With this approach, whenever the resulting Arc<dyn ObjectStore>
is used to retrieve a file from object store, the underlying implementation would have access to the object_store::Extensions
created from the SessionConfig
extensions.
Describe alternatives you've considered
This is covered in apache/arrow-rs-object-store#17 and apache/arrow-rs#7135.
Basically, there are two alternative directions:
- Update the
ObjectStore
API by providing optional trait methods that take an actual context type that can carry custom/extension data.- Considered by maintainers to be too heavy-handed.
- Don't do anything.
- This means for my use case, we wouldn't be able to properly parent tracing spans for object store accesses that happen during query execution.
Additional context
No response