-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust,python): use actual number of read rows for hive materialization #11690
Conversation
@@ -99,9 +99,11 @@ pub(super) fn array_iter_to_series( | |||
fn materialize_hive_partitions( | |||
df: &mut DataFrame, | |||
hive_partition_columns: Option<&[Series]>, | |||
num_rows: usize, | |||
num_rows: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this Some
passed? I only see it called with None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, nowhere actually - will remove the parameter.
There was a note on top of the function We have a special num_rows arg, as df can be empty.
, but there doesn't seem to be any need to special handling in this case - I added a test that loads .head(0)
and it seems to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick fix @nameexhaustion. Much appreciated.
Fixes #11682
md.num_rows()
cannot be used ascolumn_idx_to_series
may not read the entire row group when there is a SELECT->LIMIT