-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose peek next commit function to python #1937
base: main
Are you sure you want to change the base?
feat: expose peek next commit function to python #1937
Conversation
167e954
to
359306a
Compare
...able_missing_commit/.part-00000-2befed33-c358-4768-a43c-3eda0d2a499d-c000.snappy.parquet.crc
Outdated
Show resolved
Hide resolved
Haven't looked at the actual PR yet, however there is one thing that we should consider. We are currently in the process of re-writing the log-replay logic and the new approach does not work in the same way. There will eventually be the need to also have some concept of incrementally updating the state, so some notion of peeking at the next commit may still be present, but may look somewhat different. I guess what I am saying is that if we expose this, there is not much of a guarantee it can survive the next couple of releases... |
I see, thanks for letting me know. Is there a document to explain the re-writing? |
Not a publicly accessible one yet. The basic logic is quite simple though.
After that we iterate through all actions and keep track of the "seen" actions. |
359306a
to
b402f3b
Compare
@PengLiVectra - about to start reviving this. To keep things smaller could we update the test table to not contain so many files and only the commits we need for that specific scenario? |
b402f3b
to
005d52a
Compare
this would be considered a seriously corrupted table. Did you encounter this scenario using delta-rs? Or any other writer for that matter? |
Sometimes it might be delete accidentally, in this case, we don't want it to break streaming, so we skip it. |
Deleted most of the files to keep the testing data small. |
Description
Expose peek_next_commit to python. Can be used for streaming delta commit changes.
peek_next_commit
will return actions in the commit and commit version of next commit. If current version is the latest version, it will return None (actions will be None) and the current version. If there is a missed commit, it will skip the missed commit. For example, we have commits 0, 1, 2, 4, 5,peek_next_commit(2)
will return commit 4, skip the missed commit 3.An example of usage:
actions, next_version = delta_table.peek_next_commit(current_version)
If current_version is the latest version, the return will be like None, current_version.
Related Issue(s)
#1886
Testing
Added unit tests that passed.
Breaking-Change
Not a breaking change.