Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for empty query results #128

Merged
merged 3 commits into from
Aug 9, 2024
Merged

fix for empty query results #128

merged 3 commits into from
Aug 9, 2024

Conversation

dougbrn
Copy link
Collaborator

@dougbrn dougbrn commented Aug 2, 2024

Change Description

Resolves #129.

  • My PR includes a link to the issue that I am addressing

Solution Description

This PR addresses an issue flagged in #129. Where queries that return empty results in at least one partition on the nested-dask side can result in an error. This issue is not reproducible within Nested-Pandas, as it's a result of Dask's sensitivity to metadata and dtyping. The issue being that the empty series we produce from an empty query did not specify an index with identical name and dtype (the dtype being the important piece). This PR makes the small tweak of propagating the correctly dtype'd index.

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Build/CI Change Checklist

  • If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
  • If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

  • Any new or updated docstrings use the NumPy docstring format.
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover any changes
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

codecov bot commented Aug 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.44%. Comparing base (71ae3ba) to head (743de7b).
Report is 72 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #128   +/-   ##
=======================================
  Coverage   99.44%   99.44%           
=======================================
  Files          14       14           
  Lines         897      897           
=======================================
  Hits          892      892           
  Misses          5        5           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Aug 2, 2024

Before [5a6ab4a] After [9b21ddb] Ratio Benchmark (Parameter)
32.6±1ms 33.7±2ms 1.03 benchmarks.AssignSingleDfToNestedSeries.time_run
260M 262M 1.01 benchmarks.AssignSingleDfToNestedSeries.peakmem_run
87.1M 87.1M 1 benchmarks.NestedFrameAddNested.peakmem_run
94.1M 94M 1 benchmarks.NestedFrameQuery.peakmem_run
90.6M 90.7M 1 benchmarks.NestedFrameReduce.peakmem_run
60.3±3ms 59.5±3ms 0.99 benchmarks.ReassignHalfOfNestedSeries.time_run
284M 278M 0.98 benchmarks.ReassignHalfOfNestedSeries.peakmem_run
1.12±0.03ms 1.08±0.02ms 0.97 benchmarks.NestedFrameReduce.time_run
6.88±0.3ms 6.33±0.1ms 0.92 benchmarks.NestedFrameQuery.time_run
10.1±0.2ms 9.04±0.1ms 0.9 benchmarks.NestedFrameAddNested.time_run

Click here to view all benchmarks.

@dougbrn dougbrn requested a review from wilsonbb August 9, 2024 18:46
@dougbrn dougbrn marked this pull request as ready for review August 9, 2024 18:46
Copy link
Contributor

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Approving, but can we add a small accessor unit test for query flat's empty case? Then we can verify there that the index is getting propagated

@dougbrn dougbrn merged commit ac40d0f into main Aug 9, 2024
11 checks passed
@dougbrn dougbrn deleted the empty_query_fix branch August 9, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Queries that return empty results can fail
2 participants