Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boolean-based indexing (e.g., Y[Z == 1]) #3300

Open
andrjohns opened this issue Jul 16, 2024 · 3 comments
Open

Add boolean-based indexing (e.g., Y[Z == 1]) #3300

andrjohns opened this issue Jul 16, 2024 · 3 comments

Comments

@andrjohns
Copy link
Contributor

Summary:

A useful addition for simpler expressions with more complex indexing/subsetting would be to support logical indices. On the backend, this could just be syntactic sugar which first constructs an array of indices of the true values before passing to the existing indexing implementations.

C++ example:

template <typename EigVec>
inline auto rvalue(EigVec&& v, const char* name, std::vector<bool> lgl_idx) {
  std::vector<int> idx;
  for (int i = 0; i < lgl_idx.size(); i++) {
    if (lgl_idx[i]) {
      idx.push_back(i);
    }   
  }
  return rvalue(std::forward<EigVec>(v), name, index_multi(idx));
}

A simpler alternative, we could add an std::vector<bool> constructor for index_multi which performs the same logic above.

This also has a bit of overlap with the use of bool-ish types in the Stan languages, so there might be some edge-case/interaction that I'm not thinking of. Thoughts?

Current Version:

v2.35.0

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jul 16, 2024

Yes, please. I've wanted this for a long time. Working from Stan itself, I think we'd like to be able to do this:

vector[N] y = ...;
...
array[N] int<lower=0, upper=1> bool_idx = y > 0;
y[bool_idx]

This would require two things:

  1. Extending y > 0 and other conditionals to apply to arrays/vectors y to return an array of 0/1 integer values of the same size. Here this requires broadcasting the 0.
  2. Allow y[bool_idx] style indexing.

In C++, bool is just shorthand for the integer type that takes values 0 or 1.

There are many more things like this that would be useful. It'd be useful to do a survey of operations provided by the tidyverse and numpy/pandas to see what'd be useful---I often feel while writing Stan code that I want more of these operators and wind up having to write a lot of loops.

@andrjohns
Copy link
Contributor Author

I don't think we'd be able to treat an array of 0/1 ints as booleans for indexing here unfortunately, since it would be ambiguous whether an array of 1 should be treated as "all TRUE, return whole vector" or "broadcast the first element of vector"

@bob-carpenter
Copy link
Contributor

That's a good point. But we don't have a specific boolean type in Stan, do we? And I think in C++ it'd wind up being ambiguous because a bool is an int. One thing we could do is introduce another notation, like Ts select(Ts x, int[] idxs). I might have said x@idxs, but we're using @ for annotations. It would be nice to have general enough comprehensions we could do something like x[[n in 1:N s.t. idxs[n]] but that's really clunky and involves a lot of implicit binding in the parsing.

The other place I want something like select is when I have

matrix[M, N] x;
array[J] int<lower=1, upper=M> row_idxs;
int<lower=1, upper=N>[J] col_idxs;
vector[J] y = pairwise_index(x, row_idxs, col_idxs);

where pairwise_index(x, row_idxs, col_idxs)[j] = x[row_idxs[j], col_idxs[j]]. I need that function all the time. Or maybe it'd be better with tuple indexes as follows.

matrix[M, N] x;
array[J] tuple(int<lower=1, upper=M>, int<lower=1, upper=N>) idxs;
vector[J] = x[idxs];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants