Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray Serialisation Issues reading NetCDF from AzureBlobFile #477

Open
alex-rakowski opened this issue Jun 13, 2024 · 3 comments
Open

Xarray Serialisation Issues reading NetCDF from AzureBlobFile #477

alex-rakowski opened this issue Jun 13, 2024 · 3 comments

Comments

@alex-rakowski
Copy link

Trying to read a NetCDF file in xarray and running into serialisation issues.

AzureBlobFile object contains a SimpleQueue, which is non trivial to serialise. Suspect that fsspec should be handling the serialisation differently.

Simple Reproducer:

from distributed.protocol import serialize, ToPickle

storage_options = {'connection_string':***, 'account_key': ***}
fs = fsspec.filesystem('abfs',**storage_options)
url = "<CONTAINER_NAME>"
files = fs.ls(url)
ds = xr.open_dataset(
    fs.open(files[0], 'rb'),
    chunks={'x': 2000, 'y': 2000},
    engine='h5netcdf',
)
serialize(ToPickle(list(ds.variables.values())[0]._data.dask))
@TomAugspurger
Copy link
Contributor

Can you post the full traceback? What object has a reference to the queue?

@alex-rakowski
Copy link
Author

2024-06-13 12:48:57,917 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 2 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x31490b130>
 0. original-open_dataset-FSC-2bd87bcfc4ee55630c36125387cfd518
 1. open_dataset-FSC-2bd87bcfc4ee55630c36125387cfd518
>.
Traceback (most recent call last):
  File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle 'weakref.ReferenceType' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: cannot pickle 'weakref.ReferenceType' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
TypeError: cannot pickle 'weakref.ReferenceType' object

the 'weakref.ReferenceType' object will sometimes show as SimpleQueue when doing something more realistic with the dataset than shown in simple reproducer.

@TomAugspurger
Copy link
Contributor

Thanks. We'll need to figure out which attributes of which objects aren't picklable. Some of these (like things from azure.storage.blob or azure.identity) might need to be pushed upstream. Others might need to be fixed here. Any research you can do here would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants