-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Change API for writing partitioned Parquet to reduce code duplication #17586
Conversation
@@ -258,7 +155,7 @@ where | |||
.par_iter() | |||
.map(|group| { | |||
let df = unsafe { | |||
df._take_unchecked_slice_sorted(group, false, IsSorted::Ascending) | |||
df._take_unchecked_slice_sorted(group, true, IsSorted::Ascending) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by - we can enable parallel gather now that we are processing in chunks.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17586 +/- ##
==========================================
+ Coverage 80.45% 80.47% +0.02%
==========================================
Files 1484 1484
Lines 195341 195485 +144
Branches 2778 2776 -2
==========================================
+ Hits 157154 157322 +168
+ Misses 37675 37652 -23
+ Partials 512 511 -1 ☔ View full report in Codecov by Sentry. |
Very nice to consolidate those 2 functions!
|
I also think this would be more readable. Maybe even a |
The reason we are considering the |
yeah, totally get that! I think python and rust have very different approaches here
|
We can just raise when I'd say add a |
If we go for mutual excl params (which I am willing to go when properly raised), I do really think we shouldn't have nested parameters. That's just worst of both options, as you still have the worse/"unpythonic" ergonomics, but no type guarantees. @nameexhaustion can you create the needed parameters and mark them as |
Updated |
Move the partition parameters to existing
scan_*
functions instead of creating an entirely new function per scan type. There is also a bit of pre-work for getting IPC support in.Before
After