Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for hierarchical grouping to maps:groups_from_list #8830

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Maria-12648430
Copy link
Contributor

@Maria-12648430 Maria-12648430 commented Sep 19, 2024

This PR is a suggestion for an extension to maps:groups_from_list/2,3 to create nested groups, or a group hierarchy (also see here).

With this change, maps:groups_from_list/2,3 will also accept lists of instead of only a single key function. The result, accordingly, is a hierarchy of groups, with the keys and groupings on each tier given by the respective key function.

Thus, it is easy to turn, for example, a list of data rows (as may be obtained from a database), like this:

Data = [
    #{continent => asia, country => japan, city => tokyo},
    #{continent => europe, country => germany, city => berlin},
    #{continent => europe, country => germany, city => munich},
    #{continent => europe, country => sweden, city => stockholm}
].

... into a structure like this:

> maps:groups_from_list([fun(#{continent := Continent}) -> Continent end,
                         fun(#{country := Country}) -> Country end],
                        fun(#{city := City}) -> City end,
                        Data).
#{asia => #{japan => [tokyo]},
  europe => #{germany => [belin, munich]
              sweden => [stockholm]}}

While something like this is also doable with the current implementation of maps:groups_from_list/2,3 alone, it involves a significant amount of boilerplate code, which is tiresome and error-prone to write as well as hard to read, understand, and change. Having done this a couple of times in the past, I was thinking that maybe there should be a general out-of-the-box way to do it.

Another performance-related problem with repeated calls to maps:groups_from_list/2,3 (usually via maps:map calls for hierarchical groupings) is that each of those calls results in a list reversal, in order to return the grouped elements in the same order as in the input list. However, from the outside perspective, there is at most 1 reversal required, namely when the number of sub-groupings (ie, the number of tiers in the result hierarchy) is odd: two subsequent reversals will cancel each other out.

The proposal here addresses both of the problems above: no boilerplate code necessary, just give a list of key functions; reduce the number of list reversals to the required minimum, either one or zero.


The implementation we present here is a bit cumbersome. This is because of the way errors have to be for erl_stdlib_errors to work. That is, an error related to invalid input (like, an element of the KeyFuns list is not a 1-ary function, or the list to group on is not a proper list) must originate from the maps:groups_from_list/2,3 call. At the same time, errors happing in any of the given key or value functions must be passed through.


This culminates in a few peculiarities;

  • a call like maps:groups_from_list([], List) succeeds and returns List if it is a list, that is, even if it is an improper list; at the same time, a similar call like maps:groups_from_list([], ValueFun, List) succeeds and returns List mapped by ValueFun if and only if List is a proper list, and fails if List is not a proper list.
  • calls to maps:from_list/2,3, when the given List is an improper list, will fail early when the list of key functions contains an odd number of elements, but will fail only after the first iteration over the given List (ie, the generation of the first tier) if the list of key functions contains an even number of elements.

Copy link
Contributor

github-actions bot commented Sep 19, 2024

CT Test Results

    2 files     95 suites   55m 33s ⏱️
2 157 tests 2 109 ✅ 48 💤 0 ❌
2 516 runs  2 466 ✅ 50 💤 0 ❌

Results for commit 038e924.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@josevalim
Copy link
Contributor

I am not really sure this function belongs in stdlib, as it aims to solve only a very particular use case, where telling folks to reach out to a particular library or a custom solution would be completely fine. For example, I can't recall similar functionality in no other data types of other standard libraries.

If we really want to go down this route, then I would suggest rather having maps:groups_from_lists(KeyFun, ValFun, GroupFun, List), where GroupFun receives the whole group and transforms it before storing it. So you can achieve your functionality by writing:

maps:groups_from_list(
  fun(#{continent := Continent}) -> Continent end,
  fun(X) -> X end,
  fun(Child) ->
    maps:groups_from_list(
      fun(#{country := Country}) -> Country end,
      fun(#{city := City}) -> City end,
      Child
    )                      
  end, Data).

This should allow more use cases but even then I am not convinced it is worth it. :(

@juhlig
Copy link
Contributor

juhlig commented Oct 28, 2024

I am not really sure this function belongs in stdlib

The function already exists, this PR is only an extension. It does not change the current behavior, no existing code needs to be changed.

For example, I can't recall similar functionality in no other data types of other standard libraries.

AFAIK, maps:groups_from_list is the only function that does any grouping, with the result naturally being a map. I don't think it makes sense for any other datatype, except for KV-/proplists maybe.

If we really want to go down this route, then I would suggest rather having maps:groups_from_lists(KeyFun, ValFun, GroupFun, List), where GroupFun receives the whole group and transforms it before storing it.

This looks much more complicated 🫤
It would allow for more use cases I admit, but then if I wanted to do grouping plus, it is most likely that I want that plus to be deep grouping than anything else. We were not aiming to provide a grouping with a one-size-fits-all hammer attached.
I also grant you that there are existing ways to achieve similar results, but they are all much more elaborate.


tl;dr: lets see what the OTP team thinks 😉

@josevalim
Copy link
Contributor

josevalim commented Oct 28, 2024

I was mostly asking about other programming languages that offer such grouping as part of their standard libraries, to use it as reference. In any case, we have had gb_trees, dict, orddict, etc for decades and it seems the use case has not been common enough to warrant similar requests.

The main concern is that we added this particular function to the maps module because the lists module has grown too large. The question is: how do we avoid the maps module becoming too large now? Surely, the function already exists, but even then there is a big difference between a small function that does one thing well and a large function that does multiple things.

It is easy for standard library modules to do too much if they attempt to solve every possible problem. But I'd say that's not their job. Which practical examples could this function be applied to? Erlang/OTP is a large codebase, perhaps there are examples of where this function would be useful in Erlang/OTP itself?

@Maria-12648430
Copy link
Contributor Author

Maria-12648430 commented Oct 28, 2024

I was mostly asking about other programming languages that offer such grouping as part of their standard libraries, to use it as reference. In any case, we have had gb_trees, dict, orddict, etc for decades and it seems the use case has not been common enough to warrant similar requests.

Well, in the light of a recent PR ( 😁 ), supervisor:which_children has also been around for ages, and I'm not aware of requests for getting a specific child instead of all children and then ferreting out the interesting one 😜

The main concern is that we added this particular function to the maps module because the lists module has grown too large. The question is: how do we avoid the maps module becoming too large now? Surely, the function already exists, but even then there is a big difference between a small function that does one thing well and a large function that does multiple things.

It doesn't do multiple things, IMO. It does the same thing (that is, grouping), only adding the ability to group on multiple levels. And wouldn't your suggestion of a separate group function add the "does multiple [different] things" part?

It is easy for standard library modules to do too much if they attempt to solve every possible problem. But I'd say that's not their job. Which practical examples could this function be applied to? Erlang/OTP is a large codebase, perhaps there are examples of where this function would be useful in Erlang/OTP itself?

Frankly, I don't know 🤷‍♀️ (and I think it is asking a bit much of me 😅). I do have my own use cases, several, for which I gave an (necessarily abstracted) example. Admittedly, they are somewhat removed from usages in OTP itself; and if this PR gets rejected, I can as well go with an own implementation, so I don't exactly mind 😜

That said, there are 26 usages of groups_from_list in OTP, 21 of which are in the maps and 2 are in the erl_stdlib_errors modules; I would generally discount those, which leaves us with 3. At a cursory glance, those simply don't need deep grouping.

I guess what I'm trying to say is, this function is probably useful in external stuff (like, mine), and not in OTP itself (which doesn't make big use of even the single-level grouping).

@josevalim
Copy link
Contributor

josevalim commented Oct 28, 2024

Well, in the light of #8976 (:grin:), supervisor:which_children has also been around for ages, and I'm not aware of requests for getting a specific child instead of all children and then ferreting out the interesting one 😜

Right, but when you do a search in Erlang/OTP for which_children, literally the first result could benefit from which_child:

Children = supervisor:which_children(?MODULE),
case [Id || {Id, P, _Type, _Modules} <- Children, P =:= Pid] of

So it is much easier to argue for its benefits. And others can share their experience too.

In any case, I am not asking you to answer of all the questions above. You are 100% right in submitting a pull request 🚀, I am just adding my $.02. Ultimately it is not our decision to make. :)

@Maria-12648430
Copy link
Contributor Author

Ultimately it is not our decision to make. :)

Agree, let's see what others think :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants