title | author | category | tags | excerpt | status | ||
---|---|---|---|---|---|---|---|
Equality |
Mattt Thompson |
Objective-C |
nshipster |
The concept of equality is a central point of debate and inquiry in philosophy and mathematics, with far-reaching implications for matters of ethics, justice, and public policy. It is the task of programmers to reconcile our logical and physical understanding of equality with the semantic domains we model. |
|
The concept of equality is a central point of debate and inquiry in philosophy and mathematics, with far-reaching implications for matters of ethics, justice, and public policy.
From an empiricist perspective of the universe, two objects are equal if they are indistinguishable from one another in measurable observations. On a human scale, egalitarians hold that individuals should be considered equal members of the societal, economic, political, and judicial systems they inhabit.
It is the task of programmers to reconcile our logical and physical understanding of equality with the semantic domains we model. There is a subtlety to the question of equality, too often overlooked. Jumping into implementation without sufficient understanding of semantics can lead to unnecessary work that produces incorrect results. Though an understanding of the mathematical and logical system underpinning is equally essential to making things work as modeled.
While the temptation for all technical blog posts is to skim for headings and code samples, please take a few minutes to read and understand all of this. Copying relevant-looking code verbatim without knowing why its there may lead to incorrect behavior. With all seriousness, equality is one of those topics—in Objective-C in particular—where there is still a great deal of confusion.
First and foremost, it is important to make a distinction between equality and identity.
Two objects may be equal or equivalent to one another, if they share a common set of observable properties. Yet, those two objects may still be thought to be distinct, each with their own identity. In programming, an object's identity is tied to its memory address.
NSObject
tests equality with another object with the method isEqual:
. In its base implementation, an equality check is essentially a test for identity. Two NSObject
s are considered equal if they point to the same memory address.
@implementation NSObject (Approximate)
- (BOOL)isEqual:(id)object {
return self == object;
}
@end
For container classes like NSArray
, NSDictionary
, and NSString
, the expected and indeed more useful behavior would be to do a deep equality comparison, to test that each member in the collection is equal.
Subclasses of NSObject
implementing their own isEqual:
method are expected to do the following:
- Implement a new
isEqualTo__ClassName__:
method, which performs the meaningful value comparison. - Override
isEqual:
to make class and object identity checks, falling back on the aforementioned value comparison method. - Override
hash
, which will be described in the next section.
Here's an idea of how NSArray
might do this (ignoring, for this example, that as a class cluster, the actual implementation would be significantly more complicated):
@implementation NSArray (Approximate)
- (BOOL)isEqualToArray:(NSArray *)array {
if (!array || [self count] != [array count]) {
return NO;
}
for (NSUInteger idx = 0; idx < [array count]; idx++) {
if (![self[idx] isEqual:array[idx]]) {
return NO;
}
}
return YES;
}
- (BOOL)isEqual:(id)object {
if (self == object) {
return YES;
}
if (![object isKindOfClass:[NSArray class]]) {
return NO;
}
return [self isEqualToArray:(NSArray *)object];
}
@end
The following NSObject
subclasses in Foundation have custom equality implementations, with the corresponding method:
NSAttributedString -isEqualToAttributedString:
NSData -isEqualToData:
NSDate -isEqualToDate:
NSDictionary -isEqualToDictionary:
NSHashTable -isEqualToHashTable:
NSIndexSet -isEqualToIndexSet:
NSNumber -isEqualToNumber:
NSOrderedSet -isEqualToOrderedSet:
NSSet -isEqualToSet:
NSString -isEqualToString:
NSTimeZone -isEqualToTimeZone:
NSValue -isEqualToValue:
When comparing two instances of any of these classes, one is encouraged to use these high-level methods rather than isEqual:
.
However, our theoretical implementation is yet incomplete. Let's turn our attention now to hash
(after a quick detour to clear something up about NSString
:
As an interesting aside, consider the following:
NSString *a = @"Hello";
NSString *b = @"Hello";
BOOL wtf = (a == b); // YES
Let it be perfectly clear that the correct way to compare NSString
objects is to use -isEqualToString:
. Under no circumstances should you compare NSString
with the ==
operator.
So what's going on here? Why does this work, when the same code for NSArray
or NSDictionary
literals wouldn't work?
It all has to do with an optimization technique known as string interning, whereby one copy of immutable string value is copied for each distinct value. NSString *a
and *b
point to the same copy of the interned string value @"Hello"
. Note that this only works for statically defined immutable strings.
Interestingly enough, Objective-C selector names are also stored as interned strings in a shared string pool.
themoreyouknow.gif
.
The primary use case of object equality tests for everyday object-oriented programming is to determine collection membership. To keep this fast, subclasses with custom equality implementations are expected to implement hash
as well:
- Object equality is commutative (
[a isEqual:b]
⇒[b isEqual:a]
) - If objects are equal, then their
hash
values must also be equal ([a isEqual:b]
⇒[a hash] == [b hash]
) - However, the converse does not hold: two objects need not be equal in order for their hash values to be equal (
[a hash] == [b hash]
¬⇒[a isEqual:b]
)
Now for a quick flashback to Computer Science 101:
A hash table is a fundamental data structure in programming, and it's what enables NSSet
& NSDictionary
to have fast (O(1)
) lookup of elements.
We can best understand hash tables by contrasting them to arrays:
Arrays store elements in sequential indexes, such that an Array of size n
will have slots at positions 0
, 1
, up to n - 1
. To determine where an element is stored in the array (if at all), each position would have to be checked one-by-one (unless the array happens to be sorted, but that's another story).
Hash Tables take a slightly different approach. Rather than storing elements sequentially (0
, 1
, ...
, n-1
), a hash table allocates n
positions in memory, and uses a function to calculate a position within that range. A hash function is deterministic, and a good hash function generates values in a relatively uniform distribution without being too computationally expensive. A hash collision occurs when two different objects calculate the same hash value. When this happens, the hash table will seek from the point of collision and place the new object in the first available place. As a hash table becomes more congested, the likelihood of collision increases, which leads to more time spent looking for a free space (hence why a hash function with a uniform distribution is so desireable).
One of the most common misconceptions about implementing a custom hash
function comes from affirming the consequent, thinking that hash
values must be distinct. This often leads to needlessly complicated implementations involving the magical incantation of prime numbers copied from Java textbooks. In reality, a simple XOR
over the hash values of critical properties is sufficient 99% of the time.
The trick is in thinking about what the critical value of an object is.
For an NSDate
, the time interval since a reference date would be sufficient:
@implementation NSDate (Approximate)
- (NSUInteger)hash {
return (NSUInteger)abs([self timeIntervalSinceReferenceDate]);
}
For a UIColor
, a bit-shifted sum of RGB components is a convenient calculation:
@implementation UIColor (Approximate)
- (NSUInteger)hash {
CGFloat red, green, blue;
[self getRed:&red green:&green blue:&blue alpha:nil];
return ((NSUInteger)(red * 255) << 16) + ((NSUInteger)(green * 255) << 8) + (NSUInteger)(blue * 255);
}
@end
Bringing it all together, here's how one might override the default equality implementation in a subclass:
@interface Person
@property NSString *name;
@property NSDate *birthday;
- (BOOL)isEqualToPerson:(Person *)person;
@end
@implementation Person
- (BOOL)isEqualToPerson:(Person *)person {
if (!person) {
return NO;
}
BOOL haveEqualNames = (!self.name && !person.name) || [self.name isEqualToString:person.name];
BOOL haveEqualBirthdays = (!self.birthday && !person.birthday) || [self.birthday isEqualToDate:person.birthday];
return haveEqualNames && haveEqualBirthdays;
}
#pragma mark - NSObject
- (BOOL)isEqual:(id)object {
if (self == object) {
return YES;
}
if (![object isKindOfClass:[Person class]]) {
return NO;
}
return [self isEqualToPerson:(Person *)object];
}
- (NSUInteger)hash {
return [self.name hash] ^ [self.birthday hash];
}
For the curious and pedantic, see this post from Mike Ash for an explanation of how
hash
implementations might be improved by bit-shifting or rotating composite values that may overlap.
While all of this has been an interesting exercise in epistemology and computer science, there is a lingering pragmatic detail:
You don't usually need to implement this.
There are many situations where the default identity check (two variables point to the same address in memory) is desirable behavior. This comes as a consequence of the limitations of data modeling.
Take, for instance, the previous example of the Person
class. It's not inconceivable that two individuals would share a common name and birthday. In reality, this crisis of identity would be resolved by additional information, whether it's a system-dependent identifier like a Social Security Number, their parents' identities, or any other physical attributes.
Yet even that additional information is not entirely foolproof. After all, that person could be cloned, teleported, or whisked away into a parallel universe. Unlikely? Sure. But much of the challenge in modeling systems is dealing with imperfect assumptions. Just saying.
Ultimately, it's up to the abstraction to isolate the significant, identifying features that the system cares about, and disregard the rest. The developer can then decide whether objects will be used in such a way that set membership calculations should care about. In a program that only records name
and birthday
, it may perfectly correct to treat congruent instances as distinct entities.
Hopefully, after all of this explanation, we all stand with equal footing on this slippery subject.
As humans, we strive to understand and implement equality in our society and economy; in the laws and leaders that govern us; in the understanding that we extend to one another as we journey through existence. May we continue towards that ideal, where individuals are judged by the contents of their character, just as we judge a variable by the contents of its memory address.