-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove_diagonal
option for ak.cartesian
and ak.argcartestian
#3143
Comments
This is a good idea. (That's what the argument is named in The best way to implement it is as a post-processing step, taking advantage of the fact that the output has a predictable shape per entry, like |
I @'d Iason because he's run into this a few times. Perhaps a more natural place to put this is in
|
I'd say it sounds more natural to me as a part of |
How would you feel about having an idiomatic construction, instead of a library feature? Given numbers = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]]) this makes the upper triangle of each entry (at tuples = ak.combinations(numbers, 2, axis=1) The lower triangle can be constructed by swapping the order of fields in the tuple, through an ak.unzip, reverse order of the Python tuple, followed by ak.zip. ak.zip(ak.unzip(tuples)[::-1]) Since we want both the lower triangle and the upper triangle, without the diagonal, we concatenate them at the same ak.concatenate([tuples, ak.zip(ak.unzip(tuples)[::-1])], axis=1) A potential problem with this is the order, if that matters. The original pairs scan through the matrix as right-then-down (English reading order), but the lower triangle is constructed as down-then-right (Mongolian). If that's a problem, then you can construct this from ak.argcartesian (must be "arg") with a filter: tuples = ak.argcartesian([numbers, numbers])
no_diagonal = tuples[tuples["0"] != tuples["1"]] And then the "arg" version can be applied to the original data to get the non-"arg": ak.zip([numbers[no_diagonal["0"]], numbers[no_diagonal["1"]]]) The method through ak.argcartesian also generalizes beyond pairs ( If we are to implement it in Awkward (I'm leaning more toward making it an argument of ak.combinations now), then we'd use a method like this to do it. Probably with ak.argcartesian because we'd need it to be general for it to go into a library. In fact, @lgray, I know that you need ak.combinations for one of the 8 GPU tests, and ak.combinations is implemented with a kernel. ak.cartesian is not implemented with a kernel; it uses a trick @nsmith- came up with to implement it in terms of broadcasting. Broadcasting is complicated and involves some kernels, so I don't know if you'll run into trouble if you try replacing ak.combinations with tuples = ak.argcartesian([numbers, numbers])
upper_triangle = tuples[tuples["1"] > tuples["0"]]
ak.zip([numbers[upper_triangle["0"]], numbers[upper_triangle["1"]]]) |
This produces more intermediates and is honestly a little confusing compared to having a parameters that clearly alters behavior in a specific way. |
Re: intermediates—maybe you don't want to look at the way many of the functions are implemented... But I agree that a function parameter is a lot more intuitive and discoverable than idioms like this. This remains an open feature request and I think I know which student's project it would fit under. The implementation would very likely be the one based on ak.argcartesian, generalized to any |
Well there is a problem with concatenating in dask-awkward which is why a separate function argument in ak.combinations would be good. If I concatenated on axis 0, then I had to also concatenate on axis 0 everything else that I wanted to keep the same length as my pairs. If I remember correctly, there was also an issue with global indexing there. If I concentrated on axis 1, that made my code 3 times slower and Martin said that this was because you shuffle a lot of data between partitions that way. |
Description of new feature
It is a common operation in particle physics that we'll want a cartestian product with the diagonal removed.
This occurs in "tag-and-probe" workflows used to measure physics object reconstruction efficiencies where we want to consider tagged objects also as probes to increase statistics and remove possible sources of population biases.
Here it is never the case that we want the diagonal of a cartesian product and right now it has to be removed by:
Which costs tasks in dask-awkward and is generally distracting from physics intent of code.
It would be better if, instead, we could:
and collapse all this to a single operation that also makes the intent more clear.
The text was updated successfully, but these errors were encountered: