Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix) Add write to opening mode in object writer #7961

Merged
merged 5 commits into from
Jul 16, 2024

Conversation

maxmynter
Copy link
Contributor

Python's open needs a specification of one of
create/read/write/append.

Closes #7928

Change Description

Add write mode to binary open.

Background

Share context and relevant information for the PR: offline discussions, considerations, design decisions etc.

Bug Fix

If this PR is a bug fix, please let us know about:

  1. Problem -
    Python needs specification of create/read/write/append.
  2. Root cause - Discovered root cause after investigation
    The opening mode specification was missing.
  3. Solution - How the bug was fixed
    Added the opening mode specification

Python's `open` needs a specification of one of
create/read/write/append.
@CLAassistant
Copy link

CLAassistant commented Jul 5, 2024

CLA assistant check
All committers have signed the CLA.

@maxmynter
Copy link
Contributor Author

@ozkatz

Any change requests to this?

@itaiad200 itaiad200 requested review from N-o-Z and a team July 9, 2024 10:51
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting this slip by me and the resultant delay in reviewing. Thanks!

Can you figure out a way to test this in clients/python-wrapper/tests/integration/test_object.py? If we made the error once, we might make it again. Please don't waste too much time on it, though; if you have no quick way, just open an issue.\

Again, thanks!

@arielshaqed arielshaqed added bug Something isn't working include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached contributor area/sdk/python python-wrapper labels Jul 10, 2024
@arielshaqed
Copy link
Contributor

Running Esti through a clone #7978.

@arielshaqed arielshaqed added pr/do-not-merge bug Something isn't working contributor include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached python-wrapper and removed bug Something isn't working contributor area/sdk/python include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached python-wrapper pr/do-not-merge labels Jul 10, 2024
otherwise, there will be errors for writes to the SpooledTemporaryFile
larger than the buffersize as encoding shall not be set for binary files.
The write mode is hardcoded to `wb+` due to an upstream bug.
for files larger than _WRITER_BUFFER_SIZE (hard coded value in
/lakefs/object.py, set to  32 MB) Python's SpooledTemporaryFile
forces a write to disk. Only then the kwargs passed to it are applied.
@maxmynter
Copy link
Contributor Author

I've added a test in 62c1f63, with those i found that conditionally setting the encoding results in errors if the write mode is hardcode. Binary writes throw errors with an encoding set.

I've removed the conditional in 9eadf81. Once no longer affected by the urllib bug, we should consider adding both back.

@maxmynter maxmynter requested a review from arielshaqed July 11, 2024 14:23
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Let me know if you want to add the requested TODO, or want me to do it. With that we're good to go.

MANY THANKS for the PR and your patience - it is very much appreciated.

Comment on lines 475 to 477
# Create file of size greater than _WRITER_BUFFER_SIZE (32 MB)
# set in SpooledTemporaryFile to force disk write
binary_data = b"\x00" * (32 * 1024 * 1024 + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good stuff but unfortunate: we should be able to set a lower _WRITER_BUFFER_SIZE for this test, for instance, by passing ObjectWriter a max_size parameter.

Out of scope for this PR, so please add

Suggested change
# Create file of size greater than _WRITER_BUFFER_SIZE (32 MB)
# set in SpooledTemporaryFile to force disk write
binary_data = b"\x00" * (32 * 1024 * 1024 + 1)
# Create file of size greater than _WRITER_BUFFER_SIZE (32 MB)
# set in SpooledTemporaryFile to force disk write
#
# TODO(arielshaqed/n-o-z): set a smaller spool size.
binary_data = b"\x00" * (32 * 1024 * 1024 + 1)

which are out of scope for this PR. Namely, reintroducing the
write mode depending on user specification once we are no longer
affected by the upstream urllib bug and a tuning parameter for the
SpooledTemporaryFile size in the tests.
@maxmynter maxmynter requested a review from arielshaqed July 12, 2024 09:01
@maxmynter
Copy link
Contributor Author

Happy to help :)

All done from my side!

@arielshaqed
Copy link
Contributor

Thanks!

Sorry for delays, will pull shortly. @N-o-Z we can address this comment right after; I will open an issue.

@arielshaqed arielshaqed enabled auto-merge (squash) July 15, 2024 12:11
auto-merge was automatically disabled July 15, 2024 15:14

Head branch was pushed to by a user without write access

@maxmynter
Copy link
Contributor Author

I had different linter settings and with hunk staging forgot imports to the tests.
Should be good now.

Sorry.

@arielshaqed arielshaqed removed the minor-change Used for PRs that don't require issue attached label Jul 16, 2024
@arielshaqed arielshaqed merged commit b1bf8d1 into treeverse:master Jul 16, 2024
69 of 71 checks passed
@arielshaqed
Copy link
Contributor

System test passes :-) on my copy of this branch.

Pulling this one, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contributor include-changelog PR description should be included in next release changelog python-wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Faulty open mode breaks file uploads using the ObjectWriter.
3 participants