-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525
Conversation
if size1 is None or size2 is None or size1 != size2: | ||
raise ValueError("If elements of min_sum_per_partition and " | ||
"max_sum_per_partition are sequences, then" | ||
" they must have the same length.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace at the end of line like above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review!
if size1 is None or size2 is None or size1 != size2: | ||
raise ValueError("If elements of min_sum_per_partition and " | ||
"max_sum_per_partition are sequences, then" | ||
" they must have the same length.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.
This PR introduces computing the utility analysis when there are several SUM aggregations. This covers cases when DP aggregations can be presented in the pseudo-SQL terms as
This contains the following changes:
min_sum_per_partition/max_sum_per_partition
will be sequences (instead of floats)SumCombiner
is created for each sum (i.e. for each coordinate of the tuples in 1)CompoundCombiner
keeps track sparse representation as previously,SumCombiner
receives 2d array of values for each columns, but it extracts the value for the proper column.