Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] [S3DynamoDBLogStore] S3 Conditional Writes #3596

Open
2 of 9 tasks
AlJohri opened this issue Aug 23, 2024 · 0 comments
Open
2 of 9 tasks

[Feature Request] [S3DynamoDBLogStore] S3 Conditional Writes #3596

AlJohri opened this issue Aug 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@AlJohri
Copy link

AlJohri commented Aug 23, 2024

Feature request

Now that S3 supports conditional writes, it would be great to enable multi-cluster writes without the need for S3DynamoDBLogStore.

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • S3DynamoDBLogStore
  • Other (fill in here)

Overview

For years, S3 has lacked a Put-If-Absent API. With the launch of S3 Conditional writes, this now changes. From the launch notes:

Amazon S3 adds support for conditional writes that can check for the existence of an object before creating it. This capability can help you more easily prevent applications from overwriting any existing objects when uploading data. You can perform conditional writes using PutObject or CompleteMultipartUpload API requests in both general purpose and directory buckets.

Using conditional writes, you can simplify how distributed applications with multiple clients concurrently update data in parallel across shared datasets. Each client can conditionally write objects, making sure that it does not overwrite any objects already written by another client. This means you no longer need to build any client-side consensus mechanisms to coordinate updates or use additional API requests to check for the presence of an object before uploading data. Instead, you can reliably offload such validations to S3, enabling better performance and efficiency for large-scale analytics, distributed machine learning, and other highly parallelized workloads. To use conditional writes, you can add the HTTP if-none-match conditional header along with PutObject and CompleteMultipartUpload API requests.

Motivation

This should simplify the use of delta table on S3 by enabling multi-cluster writers out of the box without any additional dependencies.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@AlJohri AlJohri added the enhancement New feature or request label Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant