Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_virtual_dataset with and without indexes #52

Merged
merged 16 commits into from
Mar 27, 2024
Merged

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Mar 26, 2024

Closes #18

This PR ensures that you can either create a virtual dataset with real in-memory pandas indexes using open_virtual_dataset(..., indexes=None), or avoid creating indexes entirely using open_virtual_dataset(..., indexes={}). The kwarg signature here was chosen to match what is planned for xr.open_dataset, see pydata/xarray#6633.

@TomNicholas TomNicholas added the xarray Requires changes to xarray upstream label Mar 26, 2024
Comment on lines +51 to 69
vds_refs = kerchunk.read_kerchunk_references_from_file(
filepath=filepath,
filetype=filetype,
)

ds = dataset_from_kerchunk_refs(
ds_refs,
if indexes is None:
# add default indexes by reading data from file
# TODO we are reading a bunch of stuff we know we won't need here, e.g. all of the data variables...
# TODO it would also be nice if we could somehow consolidate this with the reading of the kerchunk references
ds = xr.open_dataset(filepath)
indexes = ds.xindexes
ds.close()

vds = dataset_from_kerchunk_refs(
vds_refs,
drop_variables=drop_variables,
virtual_array_class=virtual_array_class,
indexes=indexes,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're literally opening the file twice here - once with kerchunk to read all the byte ranges, and then optionally once again to read in the values to use to create defaut pandas indexes with xarray.

Wondering if you have any thoughts on how hard it might be to consolidate these @jhamman ?

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 96.92308% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 90.04%. Comparing base (970d354) to head (aceba53).

Files Patch % Lines
virtualizarr/tests/test_xarray.py 95.74% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   88.12%   90.04%   +1.91%     
==========================================
  Files          13       13              
  Lines         893      944      +51     
==========================================
+ Hits          787      850      +63     
+ Misses        106       94      -12     
Flag Coverage Δ
unittests 90.04% <96.92%> (+1.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TomNicholas TomNicholas merged commit 7927fe3 into main Mar 27, 2024
8 checks passed
@TomNicholas TomNicholas deleted the open_indexes branch March 27, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xarray Requires changes to xarray upstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inferring concatenation order from coordinate data values
1 participant