-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new transaction model based on ephemeral branches #258
Conversation
cfebdaf
to
96f162a
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #258 +/- ##
==========================================
+ Coverage 94.08% 94.24% +0.15%
==========================================
Files 5 5
Lines 406 382 -24
Branches 72 70 -2
==========================================
- Hits 382 360 -22
+ Misses 15 13 -2
Partials 9 9 ☔ View full report in Codecov by Sentry. |
3033d40
to
fcbe093
Compare
Stateful transaction support requested in fsspec/filesystem_spec#1517. |
54b261f
to
9c26a65
Compare
The first building block for transactions on ephemeral branches.
It now takes the repo and branch arguments from the transaction.
The new ephemeral branch transaction is entirely eager, meaning the placeholder is not needed anymore.
Works the same as in the conftest.py temporary branch fixture. Since the branch creation is deferred until context entry, it is currently not possible to start a transaction using the fsspec file system's `start_transaction` API. Ephemeral branch is optionally merged back and deleted again on __exit__.
Since the repository and ephemeral branch are now saved on the transaction, they are no longer needed on the APIs. Commits through `LakeFSTransaction.commit()` are automatically created on the ephemeral branch. The biggest change is for rev_parse, which does not need the complicated commit fetching anymore, and can just take the ref expression directly.
Removes wrapt from the lockfiles, and bumps some Jupyter dependencies that were reported to have security vulnerabilities.
This has the unfortunate side effect that the typing does not match that of the `AbstractFileSystem` base class anymore.
Signified by the absence of the exc_info tuple in __exit__. Also, set `base_branch` as a `lakefs.Branch`, since it needs to be the left side of the diff in `__exit__` - apparently, that is not symmetric.
Requires repository, base branch everywhere. Right now, we solve all file puts by interpolating the string in-place with `tx.branch` instead of `temp_branch`. This is relatively ugly, and could be better with transaction APIs that change state on the branch (puts, rms). However, those need a check that the input path can be relative, since we just by default assume the user wants to put on the ephemeral branch.
Otherwise we are prone to the same error as in the transaction exit. This also has the side effect that we have to canonicalize `into` into a lakefs.Branch, as it has to be the LHS of the `diff()` call. Also fix expectations for the transaction merge test.
Repo and base branch are now entered into the transaction constructor. Interpolated strings for rpaths are now filled with `tx.branch.id` instead of the respective static values.
This loses the named branch in most cases, since the ephemeral branch model means one is already created. This points to that it might be useful to be able to give a name to the ephemeral branch instead of rolling with the autogenerated one.
Previously, the new branch was a result of implicit branch creation in `fs.put_file()` _within_ the context. Now, since we do not allow the base branch to be created in situ, which would be very confusing, we simply create it explicitly before the transaction start. We do this for both NEW_BRANCH (not sure what it is used for in the end) and TRAINING_BRANCH.
Saves a couple of lines (and coverage misses) in the transaction module. Also rename the tag name positional to `name` in `LakeFSTransaction.tag()`.
This seems like the more natural thing to do. Adjusts the tests to expect the new behavior. Also, since the merge logic only optionally creates a new commit on the destination branch, we can unconditionally return the destination HEAD in `LakeFSTransaction.merge()`.
e5fff8c
to
b19118e
Compare
1) by excluding `raise NotImplementedError` from coverage, and 2) by initializing the transaction's repo by a string. Also adds a test for the remaining partial covers by explicitly checking if-branches for the non-default arguments of `automerge` and `delete`.
d135d14
to
d387ec2
Compare
In case there is no upstream solution, there still is the third option, which I explored in my prototype: turning |
This way, users can debug transaction failures by looking at the left over ephemeral branch. Also move up the `fs._intrans` toggle to before branch deletion, since that needs to be put cleanly on __exit__. Set `fs._intrans` to `True` only after successful branch creation, since otherwise we end up with the wrong state.
Makes the transaction backwards compatible with fsspec. It is still undefined behavior to use the transaction without specifying a repository at minimum.
Requires a rewrite of the transaction doc due to the changes.
This way, we do not get hard-to-interpret errors on uses of `None` objects (which is the default for `LakeFSFTransaction.repository`).
We want a simple ValueError saying that the repo does not exist, not an endless stack trace on lakeFS API exceptions.
This is a breaking change, as it requires users to give arguments to the return object of
LakeFSFileSystem.transaction
.Changes the
LakeFSTransaction
class to become a repository+branch-scoped ephemeral branch.File uploads, removals, and commits happen on that ephemeral branch. After transaction exit, the created branch is optionally merged back into the base branch.
The base branch (from which we branch off in the transaction) must exist before start of the transaction.
Some transaction APIs (merge, revert, tag) can also work on branches other than the transaction's ephemeral branch - this behavior might not be well-defined, and is up for debate on removal.
Similarly, on puts and deletes, one currently has to specify the transaction branch directly via the
LakeFSTransaction.branch
property. This can be made nicer by allowing resource paths relative to the transaction's repo and branch in special functions like e.g.LakeFSTransaction.put()
(deferring tofs.put()
),LakeFSTransaction.rm()
(deferring tofs.rm()
), and so on.This pull request turns the
LakeFSTransaction
into a callable, which needs to be called with a repository name at minimum to ensure transaction execution.