Changing the default ak.flatten axis from axis=1 to axis=None: any complaints? #845

jpivarski · 2021-04-19T19:44:07Z

jpivarski
Apr 19, 2021
Maintainer

ak.flatten flattens one level of list depth by default:

>>> array = ak.Array([[[1, 2, 3], []], [], [[4, 5]]])
>>> ak.flatten(array)
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>
>>> ak.flatten(ak.flatten(array))
<Array [1, 2, 3, 4, 5] type='5 * int64'>

This is what users of functional programming languages (LISPs, Scala, Spark) would expect:

scala> val array = List(List(List(1, 2, 3), List()), List(), List(List(4, 5)))
array: List[List[List[Int]]] = List(List(List(1, 2, 3), List()), List(), List(List(4, 5)))

scala> array.flatten
res0: List[List[Int]] = List(List(1, 2, 3), List(), List(4, 5))

scala> array.flatten.flatten
res1: List[Int] = List(1, 2, 3, 4, 5)

but it has come up more than once as a surprise. Users in our community expect ak.flatten to "completely flatten," which is an option, but not the default one:

>>> ak.flatten(array, axis=None)
<Array [1, 2, 3, 4, 5] type='5 * int64'>

In part, this may be because other dimension-reducing operations have axis=None as a default, such as the reducers:

>>> ak.sum(array)   # no argument, assumes you want to completely sum
15
>>> ak.sum(array, axis=0)
<Array [[5, 7, 3], []] type='2 * var * int64'>
>>> ak.sum(array, axis=1)
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>
>>> ak.sum(array, axis=2)
<Array [[6, 0], [], [9]] type='3 * var * int64'>

Actually, I would have had the default axis for reducers be axis=-1, but NumPy forced the default to be axis=None. (I don't see ak.flatten as being "similar to" ak.sum et al.)

If there's a strong consensus—i.e. a lot of people "+1"ing this or otherwise chiming in—then we can change the default. That will be rough, since the name "ak.flatten" is a good one that I don't want to change and a switch in default without renaming the function is going to break somebody's code, possibly in subtle ways (i.e. wrong answer, rather than an error message like "this function doesn't exist anymore, use the new name with the new default").

Here's how it could be done, if there's a groundswell of support for it: I can immediately make the default argument a dummy object that, if detected, will raise a warning saying, "`ak.flatten default axis is changing from 1 to None in version 1.4.0 (2021-08-01); please specify an explicit axis for now" and then change the default at that time. This is referring to the scheduled semi-major releases in which breaking changes are allowed. Anyone who upgrades Awkward Array between now and August will get the message and adjust. Anyone who doesn't—people who have already upgraded to Awkward 1.x but do not upgrade again in this 3.5 month window—won't see the message and risk getting an error due to a default behavior changing under them.

As an alternative, we could leave ak.flatten as it is and introduce ak.ravel (a NEP 18 overload of np.ravel), which is just ak.flatten with axis=None. That doesn't help people who are incorrectly guessing that ak.flatten completely flattens—it's a different function name to remember—but it seems appropriate to me because "np.ravel" is the NumPy word for "complete flatten," and it doesn't conflict with the usage of "flatten" in functional programming.

So, let me know what you think! This thread was prompted by #832, and @masonproffitt and @agoose77, if you want to point others to this to avoid letting this thread die in obscurity, please do so.

agoose77 · 2021-04-20T09:01:14Z

agoose77
Apr 20, 2021
Collaborator

Hey @jpivarski, this is a great summary of where things stand.

I think that it would be worth overloading np.ravel for Awkward arrays regardless of what the equivalent Awkward function is actually called. As to whether to do this with a new Awkward function or not, I suppose is a matter or taste. I would be tempted to introduce an ak.ravel function in order to keep API parity with NumPy.

Concerning ak.flatten, as a non-fp user, the name is non-intuitive; it's not "flattening" the array, rather it is expanding/dropping one (preferred) dimension. However, I can see that Scala does use the axis=1 convention, and therefore if this name is "standard", then it would be best to keep axis=1. I would go as far as to say that if np.ravel has a similarly named Awkward counterpart, then most users will reach for that rather than flatten, and the point becomes moot.

2 replies

masonproffitt Apr 20, 2021

Interestingly, I would argue that ravel is the non-intuitive name. To me, it sounds like the exact opposite behavior ("tangling things up"). Removing a dimension is exactly what flattening is: I think of flattening a 3D box to a rectangle or flattening a rectangle to a line. Even in the documentation for np.ravel, the behavior of the function is explained as returning a "flattened" array.

Numpy's flatten uses the axis=None behavior (though without an actual axis parameter) which was the primary reason I suggested this change for ak.flatten. I had actually never heard of np.ravel before that issue that I made (#832). I've always used flatten in both numpy and awkward, so I would not be so sure that users would go to ravel instead of flatten. Personally I would expect the opposite, especially since awkward0.JaggedArray has a flatten, although there the default is axis=0 but the meaning is different such that it corresponds to ak.flatten's axis=1.

agoose77 Apr 20, 2021
Collaborator

Ah, well in terms of etymology, ravel is entirely unintuitive, but here we are 🤕

As flatten is an ndarray method rather than top-level function, I think it is likely used less frequently (though I have no data to support this claim). I suppose this raises the question of whether Awkward should be NumPy-like, or classical FP-like. I have been using Awkward from the vantage point of a NumPy user, so that guides my opinion on what is "intuitive" for "new" users.

To fully align with NumPy conventions, I would probably want to make flatten and ravel aliases of the same function implementing the existing flatten(axis=None) behaviour, and introduce a new function e.g. expand, drop, simplify, relax. These are terrible names; the point here is the motivation for doing it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing the default ak.flatten axis from axis=1 to axis=None: any complaints? #845

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Changing the default ak.flatten axis from axis=1 to axis=None: any complaints? #845

jpivarski Apr 19, 2021 Maintainer

Replies: 1 comment · 2 replies

agoose77 Apr 20, 2021 Collaborator

masonproffitt Apr 20, 2021

agoose77 Apr 20, 2021 Collaborator

jpivarski
Apr 19, 2021
Maintainer

Replies: 1 comment 2 replies

agoose77
Apr 20, 2021
Collaborator

agoose77 Apr 20, 2021
Collaborator