Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuiltList won't compute hash as part of ==. #269

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lrhn
Copy link

@lrhn lrhn commented Jan 3, 2023

The implementation of BuiltList.operator== used this.hashCode != other.hashCode as a short-cut in operator ==. However, computing hashCode can be as expensive as doing equality, since it has to visit all the same objects (otherwise the hash code won't be consistent with equality) and do a computation on each.

This change makes the hashCode only be used if it's already computed by both this and other, in which case it should be free.

It's a change in performance-behavior. A single == won't do double work, but repeated ==s on the same objects might have been more efficient if the hash code was computed eagerly. For elements in a set or map, the hash code will be computed anyway, so it should not affect those. Having a list of BuiltLists and doing repeated indexOf with the same values on the list might suffer.

I believe the trade-off is worth it, because most lists are not compared for equality, and those that are, are either one-off comparisons, or used as map keys or set elements.
(But be free to disagree.)

The implementation of `BuiltList.operator==` used `this.hashCode != other.hashCode` as a short-cut in operator `==`.
However, computing `hashCode` can be as expensive as doing equality, since it has to visit all the same objects (otherwise the hash code won't be consistent with equality) and do a computation on each.

This change makes the `hashCode` only be used if it's already computed by both `this` and `other`, in which case it should be free.

It's a change in performance-behavior. A single `==` won't do double work, but repeated `==`s on the same objects might have been more efficient if the hash code was computed eagerly. For elements in a set or map, the hash code will be computed anyway, so it should not affect those. Having a list of `BuiltList`s and doing repeated `indexOf` with the same values on the list might suffer.
I believe the trade-off is worth it, because most lists are not compared for equality, and those that are, are either one-off comparisons, or used as map keys or set elements.
other._hashCode != null &&
_hashCode != other._hashCode) {
return false;
}
for (var i = 0; i != length; ++i) {
if (other[i] != this[i]) return false;
Copy link
Author

@lrhn lrhn Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For curiosity: This is the opposite order of what the platform collections otherwise use. For example. contains will use this[i] == needle as a test, not the other way around.
Is there a reason this is not this[i] == other[i]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason that I can remember :)

@davidmorgan
Copy link
Contributor

Thanks Lasse!

The use case I had in mind when I wrote this was for built_value models. Although built_value classes have the option to cache hashCode, they don't by default. So I thought, if there is some model e.g.

abstract class Model {
  BuiltList<Account> get accounts;
}

abstract class Account {
  BuiltList<Card> cards;
}

then I thought it might be a win if there is some caching of the list hash codes. It's not that the lists are expected to be large--rather that they may contain complex objects that don't cache their hash codes.

But, this was just a guess, I didn't (and still don't) have data to back it up.

I guess a primary use case for built_collection is in UIs, where change detection between very similar data models is probably important. But then it's actually "quickly return true" that's important, not "quickly return false". So I don't think it's particularly convincing.

I thought about calculating a more robust hash to allow quickly returning "almost certainly true" but did not actually explore that option :) in the end we would need real data to know if it's worth it.

Did you have any particular app/codebase in mind for this PR? If you have any real data that can be quite compelling against my lack of data ;)

Thanks.

@lrhn
Copy link
Author

lrhn commented Jan 9, 2023

No specific use, I just ended up in the method when doing a code review and wondered about the design.

It seemed like overkill to check hash codes first, and then do equals afterwards anyway, precisely because the true path is usually the longest already.
Having cached hash codes makes it an efficient quick-check, but if the hash code is not cached, computing it is likely as expensive as doing the equals check. That can still be valuable if it's expected that the hash code will be used again anyway, but for something like adding to a hash set or map, the hash code will be computed before the == check, so checking whether there is a cached hash seemed like the best compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants