-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance ranking code #589
Enhance ranking code #589
Conversation
_rank() helper provides: 1) correct support for n-dim, n>1, input arrays 2) minimizes code duplication 3) passthrough of sortperm() args 4) macro-less support for missing values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the very slow reponse...
src/ranking.jl
Outdated
return rks | ||
end | ||
|
||
|
||
""" | ||
ordinalrank(x; lt = isless, rev::Bool = false) | ||
ordinalrank(x; sortkwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating the list of supported arguments is more user-friendly IMO, and it doesn't take a lot of space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've expanded lt, by and rev, because they are more commonly used, and alg and order definitions are too long.
The docstring now explicitly refers to the sort(x)
keywords in the description.
src/ranking.jl
Outdated
xv = Vector{T}(undef, length(inds)) | ||
@inbounds for (i, ind) in enumerate(inds) | ||
xv[i] = x[ind] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just do xv = view(x, inds)
? If making a copy is faster, then xv = disallowmissing(view(x, inds))
as in the old code or xv = convert(AbstractArray{T}, x)
is both shorter and much more efficient e.g. for CategoricalArray
.
n = _check_randparams(rks, x, p) | ||
|
||
if n > 0 | ||
@inbounds if n > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, maybe replace these while
loops with for i in 2:n
and so on? They are hard to read currently.
src/ranking.jl
Outdated
("1234" ranking) of an array. The `lt` keyword allows providing a custom "less | ||
than" function; use `rev=true` to reverse the sorting order. | ||
All items in `x` are given distinct, successive ranks based on their | ||
("1234" ranking) of an array. Supports the same keyword arguments as `sort(x; sortkwargs...)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sortkwargs
isn't defined now. Same below.
("1234" ranking) of an array. Supports the same keyword arguments as `sort(x; sortkwargs...)` | |
("1234" ranking) of an array. Supports the same keyword arguments as the `sort` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in L48 I sort-of introduce the sortkwargs, because otherwise later in the docstring it would not be entirely clear that the keyword args are passed through to sort
to define the element order. I think from the context (supports the same keyword args as ...) it should be clear that sortkwargs refers to the keyword arguments of ordinalranks(). But if you have a better suggestion, I'm eager to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I would just have said "in the sorted vector" in the next sentence.
As you prefer, but if you keep this then please replace sort(x)
with just sort
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good. I also simplified a bit the language in the other ranking docstrings.
CI failed on nightly, but it's not ranking-related. @nalimilan Thanks for the review! There's also #590 if you have time :) |
src/ranking.jl
Outdated
function _rank(f!, x::AbstractArray{>: Missing}, R::Type=Int; sortkwargs...) | ||
inds = findall(!ismissing, vec(x)) | ||
isempty(inds) && return missings(R, size(x)) | ||
T = nonmissingtype(eltype(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed that T
is not used anymore, will remove this line
Small cleanups in the ranking code. Added
@inbounds
annotations for speed.Common code is moved into
_rank()
method, which allows to:x::AbstractVector
sortperm()
kwargs are just passed through, so, in addition to lt and rev, ranking also supports by (so it could be applied to custom types) and alg