-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise on out of bounds range access #1061
Conversation
If `step > 1`, it's possible for `range.last` to be pastthe bounds but for the `range` to still be a subset of the columns.
The bounds check on ranges was a little fiddly since |
lib/explorer/shared.ex
Outdated
[slice_min, slice_pseudo_max] = | ||
[first, last] | ||
|> Enum.map(&if(&1 < 0, do: n_cols + &1, else: &1)) | ||
|> Enum.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because ranges can be reverse ordered, I am not sure how much we want to rely on sort. My suggestion would be to validate first and last independently:
for pos <- [first, last] do
cond do
pos >= 0 and pos < n_cols -> :ok
pos < 0 and pos >= -n_cols -> :ok
true -> raise "..."
end
end
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum.slice/2
says it doesn't support reverse ordered ranges yet. Do you want to support them here in anticipation of their future support in Elixir 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. If the code above works (big if), then it should be simpler and more future proof. Worth giving it a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Unfortunately though, I think your algorithm doesn't work for the following situation:
df = DF.new(a: [1, 2, 3], b: ["a", "b", "c"], c: [4.0, 5.1, 6.2])
n_cols = DF.n_columns(df) #=> 3
slice = 1..10//10
assert slice.last > n_cols
assert DF.to_columns(df[slice]) == %{"b" => ["a", "b", "c"]}
Here slice
is a valid subset of the columns because even though slice.last > n_cols
, Enum.to_list(slice) == [1]
. This only happens when step > 1
of course.
We can still try and work out a future proof approach though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can adjust the last for a range start..finish//step
, the last element can be calculated using this formula:
start + (div(finish - start, step) * step)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be best to leave this to until we add a slice! to Elixir? 🤔
I think I'd prefer to merge this if possible. The semantics of out-of-bounds are currently different between Nx and Explorer. This PR attempts to bring them in line.
The other one is an empty range altogether. And then there are ranges that raise when out of index.
Sorry if I'm not understanding. To my eyes, 10..0//1
is both empty on its own and out of bounds for a 2-column dataframe and so should raise. Can you clarify what you think should happen with these two cases?
df = DF.new(a: [1], b: [2]) # 2 columns
df[1..0//1] # empty, but within bounds
df[9..0//1] # empty, but out of bounds
I think df[1..0//1]
should return an empty dataframe and df[9..0//1]
should raise. The other option is that all empty ranges (post normalizing) should return an empty dataframe regardless of bounds.
I admit this is a weird case and I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that, if 1..10//10 should not raise for being outbounds because 10 is not actually included, so should not 10..1//1 for the same reason?
I agree that it doesn’t really matter which, but I think for those cases it should be consistent. I think always checking first and last, regardless of step, is the simplest way to go about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that, if 1..10//10 should not raise for being outbounds because 10 is not actually included, so should not 10..1//1 for the same reason?
Ah great point. It's like whack-a-mole with these cases!
I think always checking first and last, regardless of step, is the simplest way to go about it?
I think you're right. I'll go ahead and do that.
The results will differ slightly from Enum.slice/2
. But we need to do what makes sense for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to think of slice as an implementation detail. I think my proposed implementation should work for checking indexes too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to think of slice as an implementation detail.
That makes sense!
I think my proposed implementation should work for checking indexes too!
It did! Hope it's ok that I played a little code golf with it. I realized yours was equivalent to:
if max(abs(first), abs(last)) >= n_cols do
raise ...
end
This won't be supported until Elixir 2.0. But we can make it future proof now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clean!
Fixes: #1005
In the example from that issue we now get: