Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for roundtrips of cloudpaths through pickle serialization #454

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

kujenga
Copy link
Contributor

@kujenga kujenga commented Jul 21, 2024

This avoids an exception thrown because the client is not serialized into the pickeld object, and thus when getstate is called the second time, there is no _client field to delete.

Closes #450


Contributor checklist:

  • I have read and understood CONTRIBUTING.md
  • Confirmed an issue exists for the PR, and the text Closes #issue appears in the PR summary (e.g., Closes #123).
  • Confirmed PR is rebased onto the latest base
  • Confirmed failure before change and success after change
  • Any generic new functionality is replicated across cloud providers if necessary
  • Tested manually against live server backend for at least one provider
  • Added tests for any new functionality
  • Linting passes locally
  • Tests pass locally
  • Updated HISTORY.md with the issue that is addressed and the PR you are submitting. If the top section is not `## UNRELEASED``, then you need to add a new section to the top of the document for your change.

from cloudpathlib import CloudPath


def test_pickle_roundtrip():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an existing file this would be better suited to put it within let me know and happy to move it!

I verified that this does repro the issue without the above change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate file for serializing is fine. Can you also move this test over to the new file so they are together?

def test_pickle(rig, tmpdir):
p = rig.create_cloud_path("dir_0/file0_0.txt")
with (tmpdir / "test.pkl").open("wb") as f:
pickle.dump(p, f)
with (tmpdir / "test.pkl").open("rb") as f:
pickled = pickle.load(f)
# test a call to the network
assert pickled.exists()
# check we unpickled, and that client is the default client
assert str(pickled) == str(p)
assert pickled.client == p.client
assert rig.client_class._default_client == pickled.client

@kujenga kujenga changed the title allow for roundtrip of cloudpaths through pickle serialization allow for roundtrips of cloudpaths through pickle serialization Jul 21, 2024
This avoids an exception thrown because the _client is not serialized
into the pickled object, and thus when __getstate__ is called the second
time, there is no _client field to delete.

Closes drivendataorg#450
@kujenga kujenga force-pushed the at/pickle-roundtrip-safety branch from 47490cf to 514d447 Compare July 21, 2024 02:53
Copy link

codecov bot commented Jul 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.4%. Comparing base (08b018b) to head (514d447).
Report is 3 commits behind head on 454-local.

Additional details and impacted files
@@             Coverage Diff             @@
##           454-local    #454     +/-   ##
===========================================
- Coverage       93.7%   93.4%   -0.4%     
===========================================
  Files             23      23             
  Lines           1654    1655      +1     
===========================================
- Hits            1551    1546      -5     
- Misses           103     109      +6     
Files Coverage Δ
cloudpathlib/cloudpath.py 93.5% <100.0%> (-0.4%) ⬇️

... and 2 files with indirect coverage changes

from cloudpathlib import CloudPath


def test_pickle_roundtrip():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate file for serializing is fine. Can you also move this test over to the new file so they are together?

def test_pickle(rig, tmpdir):
p = rig.create_cloud_path("dir_0/file0_0.txt")
with (tmpdir / "test.pkl").open("wb") as f:
pickle.dump(p, f)
with (tmpdir / "test.pkl").open("rb") as f:
pickled = pickle.load(f)
# test a call to the network
assert pickled.exists()
# check we unpickled, and that client is the default client
assert str(pickled) == str(p)
assert pickled.client == p.client
assert rig.client_class._default_client == pickled.client

@pjbull
Copy link
Member

pjbull commented Jul 21, 2024

Looks good, thanks! One small ask to move the other pickle test to the same place

@pjbull
Copy link
Member

pjbull commented Aug 11, 2024

@kujenga Planning a release soon—will you have a chance to wrap this PR?

@pjbull pjbull changed the base branch from master to 454-local August 15, 2024 00:09
@pjbull pjbull merged commit 137bb0a into drivendataorg:454-local Aug 15, 2024
23 of 24 checks passed
jayqi pushed a commit that referenced this pull request Aug 21, 2024
* allow for roundtrips of cloudpaths through pickle serialization (#454)

This avoids an exception thrown because the _client is not serialized
into the pickled object, and thus when __getstate__ is called the second
time, there is no _client field to delete.

Closes #450

* pickle tests

---------

Co-authored-by: Aaron Taylor <[email protected]>
@kujenga
Copy link
Contributor Author

kujenga commented Aug 31, 2024

Thank you very much! Apologies I missed the earlier notification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple pickle roundtrip serializations cause KeyError
2 participants