-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group By: add straightforward possiblility to do aggregations over all records #6976
Comments
We had quite a discussion at today's live meeting. Arguments against were that this is not something commonly used and it doesn't belong to Group by because it's about aggregation not grouping. On the other hand, everything is already in the widget. We could have an option "All rows", but it would sound wrong - how do you group by all rows? We found a compromise solution. Currently you can select multiple variables, but you have to select at least one. We can allow to deselect all variables and therefore "group by nothing" (= not group). |
That could be a solution albeit not a very intuitive one. |
I know. We had this in mind. On the other hand, it's not intuitive to use the Group By widget to aggregate over all data, so our reasoning was that if anybody actually uses this widget for that purpose, he'll get this idea precisely by seeing that he can "group by nothing". |
I've been probably influenced by my use of RapidMiner, which I had used before Orange (and dropped because I prefer to use open-source software in teaching). In RapidMiner there's an operator (≈widget) "Aggregate" that includes group-by. So I guess aggregation is an overarching operation that includes grouping. Since the widget is called Group By, it cannot include overall aggregation. If the widget would have been called Aggregate (which, I agree, would cause confusion with Aggregate Columns), it could have included Group By. |
You are right. Even the current name of the widget is misleading; it should be something like "Aggregate over groups" (obviously too long). Renaming it to Aggregate would make sense even if we don't change it -- though after renaming we could easily include the overall aggregation. |
What's your use case?
What's your proposed solution?
Sometimes it is useful to have several aggregations of selected variables over all the data records. To that end, it would be nice to have "all rows" as an option in the column on the left in the Group-by interface
Are there any alternative solutions?
The obvious trick to achieve this with the current functionality is to introduce a dummy variable that has the same value for all rows, and to group by the dummy variable. However, as a workaround it is not that intuitive.
The text was updated successfully, but these errors were encountered: