Skip to content

Can isless be defined for everything? #34815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tkf opened this issue Feb 19, 2020 · 9 comments
Open

Can isless be defined for everything? #34815

tkf opened this issue Feb 19, 2020 · 9 comments
Labels
feature Indicates new feature / enhancement requests sorting Put things in order

Comments

@tkf
Copy link
Member

tkf commented Feb 19, 2020

I thought my comment #34744 (comment) may be distracting the original issue so I open it as a separate issue. My questions are:

  1. Is it possible to define isless for everything?
  2. Can it have practically fast enough implementation?
  3. Is it reasonable to use an arbitrary ordering for isless?

I think the answer to point 1 is yes, since I can do isless(a, b) = isless(objectid(a), objectid(b)). This is stupid, but I think it's OK as a proof of possibility. In particular, this means that, if we define equivalence by mapping to an integer, we get a total ordering "for free."

For point 2, I think there are cases where optimization is trivial and covers a wide range of practical use. For example, we can lower isless(a::T, b::T) with struct type T to isless on the tuples of field values.

If the philosophy behind the compatibility between hasing and sorting is to support consistent behavior of hash-based and comparison-based containers, I think the answer to point 3 is yes too. But I don't think I have thought enough about it so maybe I'm missing something. In particular, it may be impossible to avoid ordering that changes julia process to process (like how hash behaves).

@StefanKarpinski
Copy link
Member

Why is it desirable?

@tkf
Copy link
Member Author

tkf commented Feb 19, 2020

I suppose this is required for storing arbitrary key/element in a comparison-based dictionary/set? This is required for sort-based group-by, too.

(Though I guess I can live without this. I may be getting too greedy here.)

I guess a practical "baby-step" request here is to define isless(::T, ::T) when T is a struct:

isless(a::T, b::T) where {T} =
    if isstructtype(T)
        isless(_fieldvalues(a), _fieldvalues(b))
    else
        throw(MethodError(isless, (a, b)))
    end
_fieldvalues(x) = ntuple(i -> getfield(x, i), nfields(x))

Maybe even better definition is to be flexible for parametric types if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b)) instead of if isstructtype(T).

Given that hash is automatically defined for everything, being able to compare with the same type seems to be a reasonable extension.

@tkf
Copy link
Member Author

tkf commented Feb 19, 2020

Maybe even better definition is to be flexible for parametric types if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b)) instead of if isstructtype(T).

Oh, no, wait. This does not work when there are type parameters that are not determined by the field types.

@timholy
Copy link
Member

timholy commented Feb 19, 2020

Not a fan. There are times when absence of an operation helps you think clearly: for example if your calculation results in the answer 3s + 4m (s = seconds, m = meters) then you know you've made a mistake. If I'm working in a domain where I expect rotational invariance, then the only case in which a collection of points can be placed in a partial order is when they fall on a single line through the origin. If you suddenly give me an arbitrary way to order them anyway, you've made it harder for me to think clearly.

@tkf
Copy link
Member Author

tkf commented Feb 19, 2020

I agree that a lack of definitions helps to find bugs. But I think that's in direct tension to using isless for containers where you just need an ordering, not the one compatible with the mathematical objects you are modeling with the type system. I think this is more about < falling back to isless automatically. Though I guess it means any kind of automatic definition of isless requires a breaking change in <...

@rapus95
Copy link
Contributor

rapus95 commented Feb 19, 2020

I guess the better approach would be to write a new dict type which accepts a type or function for ordering its elements. Well, I guess there's such thing already. Defining isless for all pairs explicitely is a bad idea IMO. If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there. That's the cleanest approach since it doesn't do weird default behaviour for incomparable objects.
In other words: Don't introduce arbitrary decisions into fallbacks. Keep the rule to only define sane defaults and error otherwise.

@tkf
Copy link
Member Author

tkf commented Feb 19, 2020

If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there.

I think I'm convinced this is the only reasonable approach at least within 1.x. Traits for querying the existence of isless and isequal are required for making this compatible for customized isless and isequal, though.

@rapus95
Copy link
Contributor

rapus95 commented Feb 19, 2020

Regarding that I was recently (on slack) thinking about a more generic approach of isless which also tracks a ComparatorContext{T1, T2} with it. Say, we'd get a 3-arg isless. Then the following lines could be examples of what would be possible very concisely)

isless(::HashComparator, a, b) = isless(CanonicalComparator(UInt), hash(a), hash(b))
isless(::CanonicalComparator{UInt}, x::UInt, x::UInt) = #some native call
isless(a,b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
isless(::PropagateMissing, a::Missing, b) = missing
isless(::PropagateMissing, a, b::Missing) = missing
isless(x::PropagateMissing, a, b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
<(a,b) = isless(PropagateMissing(), a, b)

We can extend that behaviour to isequal. Especially PropagateMissing would be concise aswell.

Then we'd also add a trait istotalordering(x) which contracts that isless and isequal are both defined if at least one of them is defined and that known mathematical requirements for a total ordering hold. Each appropriate comparator could define that to be true. HashComparator could do that for example.

@jlapeyre
Copy link
Contributor

The semantics of ordering objects mathematically and ordering them for reproducibility are distinct. Trying to use isless for both purposes would be confusing and burdensome.

I agree that it's sometimes useful to impose a total order on a collection of (or all) objects. For example, an order on dictionary keys, or the order in which terms are printed in a symbolic algebra program. I have to totally order terms in expressions printed (and stored) by Symata. In this case, trying to extend <= is ambiguous and doesn't give you the lexicographical ordering you want anyway. I used a distinct comparison function for lexicographic ordering. I suspect that even defining a separate lexicographic ordering in Julia would be difficult, because different orderings would be better for different use cases.

@brenhinkeller brenhinkeller added the feature Indicates new feature / enhancement requests label Nov 20, 2022
@nsajko nsajko added the sorting Put things in order label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests sorting Put things in order
Projects
None yet
Development

No branches or pull requests

7 participants