Description
The following code causes a crash, which can arise when passing cloudpathlib objects to/from subprocesses.
path1 = CloudPath("s3://bucket/key")
print(path1.__dict__)
pkl1 = pickle.dumps(path1)
path2 = pickle.loads(pkl1)
print(path2.__dict__)
pkl2 = pickle.dumps(path2)
path3 = pickle.loads(pkl2)
print(path3.__dict__)
The exception raised is:
{'_handle': None, '_client': None, '_str': 's3://bucket/key', '_url': ParseResult(scheme='s3', netloc='bucket', path='/key', params='', query='', fragment=''), '_path': PurePosixPath('/bucket/key'), '_dirty': False}
{'_handle': None, '_str': 's3://bucket/key', '_url': ParseResult(scheme='s3', netloc='bucket', path='/key', params='', query='', fragment=''), '_path': PurePosixPath('/bucket/key'), '_dirty': False}
Traceback (most recent call last):
File "/path/to/cloudpathlib-pickle-crash.py", line 20, in <module>
main()
File "/path/to/cloudpathlib-pickle-crash.py", line 13, in main
pkl2 = pickle.dumps(path2)
^^^^^^^^^^^^^^^^^^^
File "/path/to/.venv/lib/python3.11/site-packages/cloudpathlib/cloudpath.py", line 266, in __getstate__
del state["_client"]
~~~~~^^^^^^^^^^^
KeyError: '_client'
The issue seems to be that these methods are not symmetrical, and the _client field is not restored when the object is unpickled in a way that might mirror this example in the docs https://docs.python.org/3/library/pickle.html#pickle-state at this point in the code:
cloudpathlib/cloudpathlib/cloudpath.py
Lines 262 to 272 in 08b018b