Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

dvadym · 2024-09-10T17:42:05Z

This PR introduces computing the utility analysis when there are several SUM aggregations. This covers cases when DP aggregations can be presented in the pseudo-SQL terms as

SELECT partition_key, DP_COUNT(), DP_SUM(column1), DP_SUM(columns2)
GROUP BY partition_key

This contains the following changes:

In case of multi-sum min_sum_per_partition/max_sum_per_partition will be sequences (instead of floats)
SumCombiner is created for each sum (i.e. for each coordinate of the tuples in 1)
CompoundCombiner keeps track sparse representation as previously, SumCombiner receives 2d array of values for each columns, but it extracts the value for the proper column.

miracvbasaran · 2024-09-11T14:45:20Z

analysis/data_structures.py

+                if size1 is None or size2 is None or size1 != size2:
+                    raise ValueError("If elements of min_sum_per_partition and "
+                                     "max_sum_per_partition are sequences, then"
+                                     " they must have the same length.")


nit: whitespace at the end of line like above

This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.

analysis/parameter_tuning.py

analysis/per_partition_combiners.py

dvadym

Thanks for review!

dvadym · 2024-09-12T09:20:07Z

analysis/data_structures.py

+                if size1 is None or size2 is None or size1 != size2:
+                    raise ValueError("If elements of min_sum_per_partition and "
+                                     "max_sum_per_partition are sequences, then"
+                                     " they must have the same length.")


This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.

dvadym added 7 commits September 10, 2024 19:18

wip

e3054e8

Merge branch 'main' into multi_tuning3

b809ae9

wip

d196abd

tests for data_structures

861005a

Per partition combiner tests

6f92164

CompoundCombiner test

1675b5f

tests

cf0f5c2

dvadym changed the title ~~(WIP) Tuning for multiple columns part 3: Utility analysis for multiple aggregation~~ Tuning for multiple columns part 3: Utility analysis for multiple aggregation Sep 11, 2024

dvadym requested a review from miracvbasaran September 11, 2024 13:43

miracvbasaran approved these changes Sep 11, 2024

View reviewed changes

dvadym added 6 commits September 11, 2024 17:13

UtilityAnalysisEngine test

1d43e37

Addressed comments

ba7eba2

Small fixes

9fe01eb

Small fixes

e93b730

Fix for empty public partitions

df16d7a

Small fixes

9e9653a

dvadym commented Sep 12, 2024

View reviewed changes

dvadym merged commit b09c365 into OpenMined:main Sep 12, 2024
6 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

dvadym commented Sep 10, 2024 •

edited

Loading

miracvbasaran Sep 11, 2024

dvadym Sep 12, 2024

dvadym left a comment

dvadym Sep 12, 2024

Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

Conversation

dvadym commented Sep 10, 2024 • edited Loading

miracvbasaran Sep 11, 2024

Choose a reason for hiding this comment

dvadym Sep 12, 2024

Choose a reason for hiding this comment

dvadym left a comment

Choose a reason for hiding this comment

dvadym Sep 12, 2024

Choose a reason for hiding this comment

dvadym commented Sep 10, 2024 •

edited

Loading