Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple pickle roundtrip serializations cause KeyError #450

Closed
kujenga opened this issue Jul 3, 2024 · 4 comments · Fixed by #454 or #460
Closed

Multiple pickle roundtrip serializations cause KeyError #450

kujenga opened this issue Jul 3, 2024 · 4 comments · Fixed by #454 or #460
Labels
bug Something isn't working

Comments

@kujenga
Copy link
Contributor

kujenga commented Jul 3, 2024

The following code causes a crash, which can arise when passing cloudpathlib objects to/from subprocesses.

    path1 = CloudPath("s3://bucket/key")
    print(path1.__dict__)
    pkl1 = pickle.dumps(path1)

    path2 = pickle.loads(pkl1)
    print(path2.__dict__)
    pkl2 = pickle.dumps(path2)

    path3 = pickle.loads(pkl2)
    print(path3.__dict__)

The exception raised is:

{'_handle': None, '_client': None, '_str': 's3://bucket/key', '_url': ParseResult(scheme='s3', netloc='bucket', path='/key', params='', query='', fragment=''), '_path': PurePosixPath('/bucket/key'), '_dirty': False}
{'_handle': None, '_str': 's3://bucket/key', '_url': ParseResult(scheme='s3', netloc='bucket', path='/key', params='', query='', fragment=''), '_path': PurePosixPath('/bucket/key'), '_dirty': False}
Traceback (most recent call last):
  File "/path/to/cloudpathlib-pickle-crash.py", line 20, in <module>
    main()
  File "/path/to/cloudpathlib-pickle-crash.py", line 13, in main
    pkl2 = pickle.dumps(path2)
           ^^^^^^^^^^^^^^^^^^^
  File "/path/to/.venv/lib/python3.11/site-packages/cloudpathlib/cloudpath.py", line 266, in __getstate__
    del state["_client"]
        ~~~~~^^^^^^^^^^^
KeyError: '_client'

The issue seems to be that these methods are not symmetrical, and the _client field is not restored when the object is unpickled in a way that might mirror this example in the docs https://docs.python.org/3/library/pickle.html#pickle-state at this point in the code:

def __getstate__(self) -> Dict[str, Any]:
state = self.__dict__.copy()
# don't pickle client
del state["_client"]
return state
def __setstate__(self, state: Dict[str, Any]) -> None:
self.__dict__.update(state)

@jayqi jayqi added the bug Something isn't working label Jul 3, 2024
@pjbull
Copy link
Member

pjbull commented Jul 3, 2024

Thanks @kujenga.

Since _client is created on demand I think the fix here is just to update __getstate__ with:

if "_client" in state:
    del state["_client"]

Would be happy to take a PR that does that and adds a test.

Thanks!

kujenga added a commit to kujenga/cloudpathlib that referenced this issue Jul 21, 2024
This avoids an exception thrown because the client is not serialized
into the pickeld object, and thus when __getstate__ is called the second
time, there is no _client field to delete.

Closes drivendataorg#450
@kujenga
Copy link
Contributor Author

kujenga commented Jul 21, 2024

Sounds good! PR submitted here: #454

kujenga added a commit to kujenga/cloudpathlib that referenced this issue Jul 21, 2024
This avoids an exception thrown because the _client is not serialized
into the pickled object, and thus when __getstate__ is called the second
time, there is no _client field to delete.

Closes drivendataorg#450
pjbull pushed a commit that referenced this issue Aug 15, 2024
This avoids an exception thrown because the _client is not serialized
into the pickled object, and thus when __getstate__ is called the second
time, there is no _client field to delete.

Closes #450
@jayqi jayqi closed this as completed in 7cbff39 Aug 21, 2024
@pjbull
Copy link
Member

pjbull commented Aug 29, 2024

@kujenga This is realeased now in v0.19.0 if you want to upgrade and test your use case.

@kujenga
Copy link
Contributor Author

kujenga commented Aug 31, 2024

Yes, this is working great for us! Thanks so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants