Skip to content

Support delete data files in fast append action #1081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mnpw opened this issue Mar 13, 2025 · 7 comments
Open

Support delete data files in fast append action #1081

mnpw opened this issue Mar 13, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@mnpw
Copy link
Contributor

mnpw commented Mar 13, 2025

Is your feature request related to a problem or challenge?

The transaction API exposes FastAppendAction for making commits to catalog.

Equality delete writer was added with #703 to support writing equality delete data files. However, FastAppendAction does not support committing equality delete data files.

See DataContentType check – https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/transaction.rs#L443-L447

Describe the solution you'd like

It would be great if we can enhance FastAppendAction to support committing equality delete data files as well. I am willing to work on this.

Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

@mnpw mnpw added the enhancement New feature or request label Mar 13, 2025
@jonathanc-n
Copy link
Contributor

@mnpw take a look at #1017 before proceeding.

@liurenjie1024
Copy link
Contributor

Thanks @mnpw for raising this. But this should not be included in fast append action since deletion typically requires conflict detection.

@liurenjie1024
Copy link
Contributor

We should close this issue as it's not following iceberg's design.

@mnpw
Copy link
Contributor Author

mnpw commented Mar 17, 2025

But this should not be included in fast append action since deletion typically requires conflict detection.

@liurenjie1024 What do you think about a transaction action for only delete files, perhaps RowDeleteAction just like FastAppendAction? RowDeleteAction can take DataContentType::EqualityDeletes files and commit them into a new snapshot. On conflict during snapshot creation this action can behave similarly to FastAppendAction.

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

@liurenjie1024
Copy link
Contributor

On conflict during snapshot creation this action can behave similarly to FastAppendAction.

Sorry, I don't get this point. FastAppendAction will never conflict with other transactions. By conflict detection I mean ensuring snapshot isolation during concurrent write: https://iceberg.apache.org/docs/nightly/reliability/#concurrent-write-operations

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

Do you mean to write delete files only? I can understand such case, but we still need to do conflict detection for concurrent writes.

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 18, 2025

I think the intent of #798 is similar to this issue. We end up needing to implement RowDeltaAction for this intent.

@liurenjie1024 What do you think about a transaction action for only delete files, perhaps RowDeleteAction just like FastAppendAction? RowDeleteAction can take DataContentType::EqualityDeletes files and commit them into a new snapshot. On conflict during snapshot creation this action can behave similarly to FastAppendAction.

For the action of only deleting files, there may be concurrent new data append(or overwrite behavior) between them. These deleted files will affect the new append data and cause undefined behavior. So looks like we can't avoid the conflict detection. We can open the issue to track this.

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

Before we complete RowDeleteAction, personally I think maybe you can try to hack the fast append to append the deleted data file as #798.(If you just want to try some simple case) It only works in simple cases (e.g. no concurrency write) but doesn't mean it's right.

@ZENOTME ZENOTME mentioned this issue Mar 18, 2025
3 tasks
@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 18, 2025

Create an issue to track RowDeleteAction: #1104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants