-
Notifications
You must be signed in to change notification settings - Fork 28
[DISCUSSION] [object_store] New crate with object store combinators / utilitles #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for starting this discussion, I think we should definitely provide more utilities/primitives in this space.
FWIW these should probably be deprecated and re-implemented at the HttpClient level.
Now we have the HttpClient abstraction, I think this is the level I would encourage implementing most of these.
This feels like something better built into some sort of TransferManager that sits on top of the ObjectStore API, as opposed to baking it in at the ObjectStore level. Perhaps in a similar vein to BufWriter. This would, for example, allow registering a single ObjectStore, but then having different IO configurations for different areas of the stack. It would also potentially allow for greater concurrency, as the ObjectStore API has no mechanism by which chunks fetched in parallel could be returned out of order. This would be especially useful when downloading files to disk, as it avoids needing to hold chunks in memory unnecessarily. See #267 for some prior discussion.
FWIW all the first-party implementations share a lot of the same underlying logic, e.g. with things like GetClient, and so it may actually not be all that bad |
I think there is room for both some lower level ObjectStore wrappers as well as more full featured transfer manager or higher abstraction depending on the needs, and resources of the underlying application |
I've created apache/arrow-rs#7253 as an example of how the HttpClient abstraction can be used for more fine-grained control of requests, including spawning IO to a separate tokio runtime. |
i suppose this might be also related with the "Operator" concept in OpenDAL, which can help handling the chunking & concurrency parameter in a builder pattern like this:
|
@crepererum and I spoke about this issue today.
@crepererum rightly pointed out that implementing retries (aka #15) would be better than splitting into smaller requests to make a timeout as the retry mechanism automatically adjusts to current network conditions However, otherwise we have a few more potential items we may propose upstreaming
Roughtly speaking what we are thinking is:
|
@criccomini I am curious if you would have a use for RacingReads. This basically would reduce the overall latency for object store requests by running multiple requests in parallel and returning the one that completed first. The tradeoff is that this strategy increases $$$ linearly as it makes more requests |
Isn't there an upper bound on the timeout (30s by default)? And if the bound isn't large enough to push that 200MiB row group through a slow connection, won't the request fail anyway? And even if the request succeeds eventually, relying on retries to dynamically adjust the timeout seems wasteful compared to bounding request size, improving the chances the request will succeed the first time. |
I think the idea is you don't re-request the entire object, only bytes remaining So let's say you had a 200 MB request but the network can only retrieve 10MB in 30s
I agree this is not clear -- I will post the same on #15 |
That makes a lot of sense, thanks for clarifying! So this means that the same data won't get fetched multiple times, which is nice. Does the user still need to configure large enough |
I am not sure yet -- it will depend on how the feature is implemented. It is interesting to think about what to do when the process is making (very) slow progress. |
Migrating from arrow-rs issue #7251 |
This is a nice to have for us. It's certainly crossed my mind, but we haven't implemented it yet. In some cases, I suspect SlateDB users will want low latency at all costs. In other cases, cost is the main thing. :) |
Chunked Reads as requested in this ticket is similar |
Pipelines with many delta connectors hit timeout errors, likely due to apache/arrow-rs-object-store#14 Until that is fixed, we introduce a mechanism to restrict the number of concurrent readers across all delta connectors. From the docs: -- Maximum number of concurrent object store reads performed by all Delta Lake connectors. This setting is used to limit the number of concurrent reads of the object store in a pipeline with a large number of Delta Lake connectors. When multiple connectors are simultaneously reading from the object store, this can lead to transport timeouts. When enabled, this setting limits the number of concurrent reads across all connectors. This is a global setting that affects all Delta Lake connectors, and not just the connector where it is specified. It should therefore be used at most once in a pipeline. If multiple connectors specify this setting, they must all use the same value. The default value is 6. Signed-off-by: Leonid Ryzhyk <[email protected]>
Pipelines with many delta connectors hit timeout errors, likely due to apache/arrow-rs-object-store#14 Until that is fixed, we introduce a mechanism to restrict the number of concurrent readers across all delta connectors. From the docs: -- Maximum number of concurrent object store reads performed by all Delta Lake connectors. This setting is used to limit the number of concurrent reads of the object store in a pipeline with a large number of Delta Lake connectors. When multiple connectors are simultaneously reading from the object store, this can lead to transport timeouts. When enabled, this setting limits the number of concurrent reads across all connectors. This is a global setting that affects all Delta Lake connectors, and not just the connector where it is specified. It should therefore be used at most once in a pipeline. If multiple connectors specify this setting, they must all use the same value. The default value is 6. Signed-off-by: Leonid Ryzhyk <[email protected]>
Please describe what you are trying to do.
TLDR: let's combine forces rather than all reimplementing caching / chunking / etc in
object_store
!The
ObjectStore
trait is flexible and it is common to compose a stack ofObjectStore
with one wrapping underlying storesFor example, the
ThrottledStore
andLimitStore
provided with the object store crate does exactly thisMany Different Behaviors
There are many types of behaviors that can be implemented this way. Some examples I am aware of:
ThrottledStore
andLimitStore
provided with the object store crateDeltaIOStorageBackend
in delta rs from @ion-elgreco.LimitedRequestSizeObjectStore
from Timeouts reading "large" files from object stores over "slow" connections datafusion#15067)Desired behavior is varied and application specific
Also, depending on the needs of the particular app, the ideal behavior / policy is likely different.
For example,
So the point is that I don't think any one individual policy will work for all use cases (though we can certainly discuss changing the default policy)
Since
ObjectStore
is already composable, I already see projects implementing these types of things independently (for example, delta-rs and influxdb_iox both have a cross runtime object stores, and @mildbyte from splitgraph implemented some sort of visualization of object store requests over time)I believe this is similar to the OpenDAL concept of
layers
but @Xuanwo please correct me if I am wrongDesired Solution
I would like it ti be easier for users of object_store to access such features without having implement custom wrappers in parallel independently
Alternatives
New
object_store_util
crateOne alternative is to make a new crate, named
object_store_util
or similar mirroringfutures-util
andtokio-util
that has a bunch of these ObjectStore combinatorsThis could be housed outside of the apache organization, but I think it would be most valuable for the community if it was inside
Add additional policies to provided implmenetations
An alternate is to implement a more sophisticated default implementations (for example, add more options to the
AmazonS3
implementation.One upside of this approach is it could take advantage of implementation specific features
One downside is additional code and configuration complexity, especially as the different strategies are all applicable to multiple stores (e.g. GCP, S3 and Azure). Another downside is specifying the policy might be complex (like specifying concurrency along with chunking and under what circumstances should each be used)
Additional context
to_pyarrow_table()
on a table in S3 kept getting "Generic S3 error: error decoding response body" delta-io/delta-rs#2595The text was updated successfully, but these errors were encountered: