Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combinatorial procedures #550

Open
Sideboard opened this issue Oct 5, 2021 · 2 comments
Open

Combinatorial procedures #550

Sideboard opened this issue Oct 5, 2021 · 2 comments
Labels
idea Proposition of an idea and opening an issue to discuss it

Comments

@Sideboard
Copy link
Contributor

Sideboard commented Oct 5, 2021

Motivation

Functions or subroutines that take an array (or a string) and return an array (of higher rank, or of strings) with combinatorial compositions of the input values. Common facilities (e.g. Python's itertools) with example results given here in a non-Fortran nested format:

  • products([1, 2, 3], r=2) returns [[1, 1], [2, 1], [3, 1], [1, 2], [2, 2], [3, 2], [1, 3], [2, 3], [3, 3]]
  • permutations([1, 2, 3], r=2) returns [[2, 1], [3, 1], [1, 2], [3, 2], [1, 3], [2, 3]]
  • combinations([1, 2, 3], r=2) returns [[2, 1], [3, 1], [3, 2]]
  • combinations_with_replacement([1, 2, 3], r=2) returns [[1, 1], [2, 1], [3, 1], [2, 2], [3, 2], [3, 3]]

These examples are in column-major order, which makes sense for indices but less so for other elements. If this behavior is consistent, for strings the result would not be lexically ordered:

  • products('ABC', r=2) returns ['AA', 'BA', 'CA', 'AB', 'BB', 'CB', 'AC', 'BC', 'CC']
  • permutations('ABC', r=2) returns ['BA', 'CA', 'AB', 'CB', 'AC', 'BC']
  • combinations('ABC', r=2) returns ['BA', 'CA', 'CB']
  • combinations_with_replacement('ABC', r=2) returns ['AA', 'BA', 'CA', 'BB', 'CB', 'CC']

The procedures can be extended to lists.

Questions

  • What names/features should be used (e.g. combinations_with_replacement is quite long)?
  • Should the results be in column-major order?
  • Should some be merged into a single procedure?
  • Should be there alternative generator-like routines to get the next iteration from a given or saved state?

Prior Art

Additional Information

@Sideboard Sideboard added the idea Proposition of an idea and opening an issue to discuss it label Oct 5, 2021
@milancurcic
Copy link
Member

What names/features should be used (e.g. combinations_with_replacement is quite long)?

I like the names. For combinations_with_replacement, what do you think about using an optional argument to combinations to create this special case:

combinations(data, r, with_replacement=.true.)

or similar. By default, with_replacement would be .false.. This form is longer than the original name combinations_with_replacement, but I think it makes sense to merge very similar functionality into one function.

Is with_replacement a common word for this? Something like with_duplicates makes a little more sense to me, but I don't know if it's better.

Should the results be in column-major order?

Yes, so the result of products([1, 2, 3], r=2) would have the shape [2, 9]. Likewise for other functions.

Should some be merged into a single procedure?

Yes, see above what I thought about combinations. Others are okay as they are.

Should be there alternative generator-like routines to get the next iteration from a given or saved state?

I think it could be useful, let's get some feedback on it. If yes, I think it should be a separate PR so that we keep PRs as small as possible.

Does this belong in stdlib_math or stdlib_stats?

@Sideboard
Copy link
Contributor Author

Thanks for your input!

I have thought about a with_replacement optional argument. And if that can be merged, maybe all of them should be. The naming is not great to begin with and not consistent as far as I know. (By the way, usually it's either "replacement" or "repetition".)

Apparently "row-major order" is a confusing term. In all cases the number of items should be the first dimension and the number of compositions the second. The question is about the order of those compositions. What comes after "aa", is it "ba" or "ab"? Is it [1,2] or [2,1] after [1,1]? If the first entry increases, it mimics the array-logic of Fortran, but last-entry increases would lead to lexically ordered strings (and/or arrays). Should it be different for strings than for other items?

Feedback sounds great.

Another idea: How about functions that calculate just the size/shape of the results. Those are usually simple expressions like factorial or powers but would be quite convenient to first check if the result is small enough.

I am not sure where this belongs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Proposition of an idea and opening an issue to discuss it
Projects
None yet
Development

No branches or pull requests

2 participants