Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(docs): Add lakeFS-spec and lakeFS-SDK transaction differences #252

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

maxmynter
Copy link
Collaborator

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (15c3526) 94.10% compared to head (78de589) 94.08%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
- Coverage   94.10%   94.08%   -0.02%     
==========================================
  Files           5        5              
  Lines         407      406       -1     
  Branches       72       72              
==========================================
- Hits          383      382       -1     
  Misses         15       15              
  Partials        9        9              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxmynter
Copy link
Collaborator Author

I cannot directly comment on code not (yet) affected by this PR.

In the complete function of the LakeFSTransaction context class ( line 162 in the transactions.py ) we set self.fs._intrans = False. Is that necessary, since the __exit__ function of the Transaction parent class from fsspec already sets this flag to false after calling complete?

@maxmynter maxmynter requested a review from nicholasjng January 16, 2024 15:06
@nicholasjng
Copy link
Collaborator

In the complete function of the LakeFSTransaction context class ( line 162 in the transactions.py ) we set self.fs._intrans = False. Is that necessary, since the __exit__ function of the Transaction parent class from fsspec already sets this flag to false after calling complete?

I guess it's not necessary, but it does not hurt, either, so I'm unsure. If you can verify that this does not break any tests, I'm happy to see it removed.

@@ -6,6 +6,11 @@ A transaction is basically a context manager that collects all file uploads, def
They are an "all or nothing" proposition: If an error occurs during the transaction, none of the queued files are uploaded.
For more information on fsspec transactions, see the official [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#transactions).

!!! info
The transactions in `lakeFS-spec` are different from the transactions in the [lakeFS-SDK](https://docs.lakefs.io/integrations/python.html#transactions), which were added in `v0.2.0`.
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
- High-level `lakefs` SDK transactions create an ephemeral branch, perform the operations in the context block on that ephemeral branch and, upon exiting the context, merge it back into the source branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out the backticks (as mentioned above).

!!! info
The transactions in `lakeFS-spec` are different from the transactions in the [lakeFS-SDK](https://docs.lakefs.io/integrations/python.html#transactions), which were added in `v0.2.0`.
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows, for example, multiple commits per transaction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows, for example, multiple commits per transaction.
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows more flexible operations, for example, creating multiple commits per transaction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no backticks.

@AdrianoKF AdrianoKF added the documentation Improvements or additions to documentation label Jan 17, 2024
@@ -6,6 +6,11 @@ A transaction is basically a context manager that collects all file uploads, def
They are an "all or nothing" proposition: If an error occurs during the transaction, none of the queued files are uploaded.
For more information on fsspec transactions, see the official [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#transactions).

!!! info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement of this callout is a bit (visually) unfortunate, IMO, since it separates the last sentence of the introduction from the rest, creating a sort-of orphan.

Instead of a list inside the callout, I would simply have two sentences (maybe with * * emphasis on the package names).

In case you want to keep the list, please include a blank link before the first list item, otherwise the list does not render correctly in Mkdocs.

@@ -6,6 +6,11 @@ A transaction is basically a context manager that collects all file uploads, def
They are an "all or nothing" proposition: If an error occurs during the transaction, none of the queued files are uploaded.
For more information on fsspec transactions, see the official [documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#transactions).

!!! info
The transactions in `lakeFS-spec` are different from the transactions in the [lakeFS-SDK](https://docs.lakefs.io/integrations/python.html#transactions), which were added in `v0.2.0`.
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out the backticks (as mentioned above).

!!! info
The transactions in `lakeFS-spec` are different from the transactions in the [lakeFS-SDK](https://docs.lakefs.io/integrations/python.html#transactions), which were added in `v0.2.0`.
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows, for example, multiple commits per transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no backticks.

!!! info
The transactions in `lakeFS-spec` are different from the transactions in the [lakeFS-SDK](https://docs.lakefs.io/integrations/python.html#transactions), which were added in `v0.2.0`.
- `lakeFS-SDK` transactions create an ephemeral branch, perform the operations in the context block on this branch and, upon exiting the context manager, merge it back into the source branch.
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows, for example, multiple commits per transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows, for example, multiple commits per transaction.
- `lakeFS-spec` transactions collect the actions and perform them one by one directly on the source branch once the context manager is exited. This allows for a more fine-grained control over the versioning operations performed. For example, multiple commits can be performed as part of a single transaction.

since that is already taken care of in the __exit__ method of the parent class
@maxmynter maxmynter merged commit 36ceed4 into main Jan 18, 2024
5 checks passed
@maxmynter maxmynter deleted the sdk-vs-lakefsspec-transaction branch January 18, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants