Equals != Equality

The other day, I read the article How to Write an Equality Method in Java. Which reminded me that I wanted to write an entry about equality in programming. I cannot hope to cover this topic adequately in a blog post. My goal is merely to show that equality is very hard and that we often have a sever misconception about it.

Most programming languages give us several ways to check equality. Java has the operator == to test equality on primitives and references and the method boolean Object.equals(Object o) to test semantic equality of objects (we will discuss later what that means). Common Lisp has a lot more (eq, eql, equal, equalp, =, string=, …). And did you know about the === operator in Javascript?

Semantics of Equality

What does it mean that two Objects are the same? In the OO world a naive approach is to say, that two objects are the same when they are instances of the same class and all members are the same. This is a nice recursive approach, and it is also not to hard to create a terminating implementation, if you take a bit care of circular references.

However, let us consider the numerical tower. Is the integer 2 the same as the float 2.0? If you interpret class in a Java sense, then they are not the same according to the above definition, because they are not instances of the same class. You might argue that integers can be seen as a super-class of floats (meaning that integers are a subset of floats, i.e., every integer is a float). So you could let the more specialized class decide the equality.

This approach does not help us with a different problem: Two strings, according to our definition, are equal if both strings consist of the same characters (in the same order). So the strings “FooBar” and “foobar” are not equal. But what if we do not care fore case sensitivity, e.g. because our file system does not care for it? Building up a set with all file names is a reasonable use-case. A standard Java HashSet<String> would not be very helpful here.

As a matter of fact, the Java String acknowledges this problem and provides equalsIgnoreCase(), but I cannot tell HashSet to use this method instead of equals() (without extending String).

Object relational mappers (ORM) have a similar, but slightly different, problem. They have to make sure that objects in the data base and in the program are matched up correctly. Here the instances of the class Person<code> with name Maria Smith and Maria Jones might be the same object, just at a different point of time. This is usually mitigated by surrogate keys. So these objects are the same with respect to their ID (the surrogate key), but not with respect to the data they hold. A different distinction for the ORM system when it needs to synchronize objects with the persistency context.

Equivalence Relations

The equals() methods should of course implement an equivalence relation. However, very often they do not. An equivalence relation is reflexive (a = a), symmetrical (if a = b, then b = a), and transitive (if a = b and b = c, then a = c). It is quite easy to break the symmetry the class structure is not considered correctly. Let us assume you have a class A and a class B that is a subclass of A (i.e., it extends A in Java).

class A {
  int x,y;
  boolean equals(Object o) {
    if (o instanceof A) {
      A that = (A)o;
      return that.x == this.x && that.y == this.y;
    }
    return false;
  }
}

class B {
  int z;
  boolean equals(Object o) {
  if (o instanceof B) {
    B that = (B)o;
    return that.z == this.z && super.equals(o);
  }
}

If we have an instance a of A with x=1 and y=2 and an instance b of B with x=1, y=2, and z=3, then a.equals(b) will be true, but b.equals(a) will not. Just saying, only instances of the same class may be equal, is not always a viable solution, as explained above.

There Cannot be Just One

The real problem is, that there is more than one equivalence relation. Java (and many other OO languages) forces us to come up with one (more or less) canonical equivalence relation, rather than being able to define the real equivalence relation one currently needs.

A map implementation where you can define the equivalence relation would help. A Java interface for such a definition could look like this:

public interface Equality {
  boolean equals(Object o1, Object o2);
  int hash(Object o);
}

The problem is, that this equality definition needs to know about all classes whose instances might end up in the map. This problem can be solved by open classes and multiple dispatch (Java supports neither), but also by other techniques.

Mutability

The article I mentioned in the first paragraph mentions defining equality on mutable fields as a pitfall. IMHO the final keyword on variables and fields is underutilized in Java (and final should be the default) but this is beside the point. However, it is not so important that the fields used in the equals method are immutable, but that they do not change as long as the object is in the map. The way a lot of libraries and frameworks (e.g., Spring and Hibernate) work, you must give them mutable fields, but these fields might not change after the initialization is done (which is more than just calling the constructor).