Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds fine filtering on read_hipscat #350

Merged
merged 9 commits into from
Jun 21, 2024

Conversation

camposandro
Copy link
Collaborator

Adds fine filtering to the read_hipscat operation. Previously, when provided with a search_filter, the catalog was loaded with the pixels within the specified spatial region but we did not filter their data. This behavior created confusion with the users so we decided to apply the full search. We still do it in two steps, first figuring out which subset of files we should read from disk, creating the Dask DataFrame with the partitions, and then performing filtering on each of them. Closes #308.

@camposandro camposandro self-assigned this Jun 6, 2024
Copy link

github-actions bot commented Jun 6, 2024

Before [e0d3fe4] After [bae9e05] Ratio Benchmark (Parameter)
6.74±0.1ms 6.83±0.2ms 1.01 benchmarks.time_box_filter_on_partition
484±1ms 483±4ms 1 benchmarks.time_create_midsize_catalog
51.1±0.6ms 51.1±0.6ms 1 benchmarks.time_kdtree_crossmatch
16.1±0.5ms 16.1±0.4ms 1 benchmarks.time_polygon_search
3.23±0s 3.21±0.02s 0.99 benchmarks.time_create_large_catalog

Click here to view all benchmarks.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.33%. Comparing base (e0d3fe4) to head (8c69f00).
Report is 86 commits behind head on main.

Files with missing lines Patch % Lines
src/lsdb/core/search/abstract_search.py 66.66% 1 Missing ⚠️
src/lsdb/core/search/index_search.py 0.00% 1 Missing ⚠️
src/lsdb/core/search/order_search.py 0.00% 1 Missing ⚠️
src/lsdb/core/search/pixel_search.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #350      +/-   ##
==========================================
+ Coverage   98.32%   98.33%   +0.01%     
==========================================
  Files          43       43              
  Lines        1430     1440      +10     
==========================================
+ Hits         1406     1416      +10     
  Misses         24       24              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camposandro camposandro marked this pull request as ready for review June 10, 2024 17:43
src/lsdb/catalog/dataset/healpix_dataset.py Show resolved Hide resolved
src/lsdb/types.py Outdated Show resolved Hide resolved
@camposandro camposandro merged commit 79d4cb3 into main Jun 21, 2024
11 of 12 checks passed
@camposandro camposandro deleted the issue/308/fine-filtering-on-read-hipcat branch June 21, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_hipscat(search_filter) is confusing
2 participants