foonathan::blog()

Thoughts from a C++ library developer.

Mathematics behind Comparison #3: Ordering Relations in C++

In order to sort a collection of elements you need to provide a sorting predicate that determines when one element is less than the other. This predicate must “induce a strict total ordering on the equivalence classes” according to cppreference. Wait, what?

The upcoming C++ spaceship operator implements a three way comparison, i.e. it is a single function that can return the results of <, == and > combined. But related to it are terms like “strong equality” and “weak ordering” which are somewhat confusing if you don’t have the mathematical background.

So let’s untangle it: This series will explain both the mathematics behind equality and ordering, as well as give concrete guidelines for implementing the comparison operators and the spaceship operator.

The previous part was very math heavy but necessary: It introduced the mathematical terminology for ordering relations. With that done we can finally talk about how it applies to C++.

C++ Mechanics for Ordering Relations

Really quick recap: When we have two elements they can either be equal, equivalent, one less/greater than the other, or incomparable.

In mathematics this relation is specified with a binary relation that can either implement some form of or some form of <. In C++ we have the following options:

  • Overload the comparison operators <, <=, >=, >
  • Implement a named predicate (bool-returning) function that implements the corresponding mathematical relation
  • Overload the spaceship operator <=>

I’ll talk about the spaceship operator in detail in the next part, so let’s focus on the first two options only. But before we talk about the ways of implementing ordering relations for a type, we first need to talk about situations where we don’t want an ordering relation.

Unordered Types

If you remember the terminology for the first part, a type defines a set of values. But for some types this set of values is not obvious. I used the button as an example, you can’t really talk about it in a mathematical way. And if you can’t do that, this is a big sign that you don’t really known what it means to be equal.

The same applies here as well:

Rule: If you don’t know the value of your type, don’t implement an ordering relation.

Ordering relations are inherently mathematical constructs, so you need to know the mathematical representation for your type. More on the distinction in the first part.

Corollary: If your type doesn’t have an equivalence relation, don’t provide an ordering relation.

But just because you can talk about your type in mathematics doesn’t mean it should be ordered:

Rule: Only implement an ordering relation for a type if it is actually meaningful.

For example, you can easily define an ordering on any type by simply comparing each member in turn. This is called a lexicographical comparison because it is like the ordering on a string: Each character in turn.

This is even a total ordering if the member orderings are total.

However, it doesn’t make a lot of sense for most types.

Consider std::complex: it is basically a pair of two floating-point types, the real part and the imaginary part. So you could implement a total ordering by first comparing the real part, and if they are equal, comparing the imaginary part.

But this ordering doesn’t play nicely with the mathematical properties of a complex number: For example, for any real number x * x ≥ 0. But i * i = -1. And -1 is less than 0 in our order. This means that we wouldn’t have this property, which is unfortunate.

So there is no operator< on a std::complex.

There are parts of the standard library that require an ordering, however. std::set needs it to do O(log n) lookup, std::sort() needs it to actually sort, etc. But the lack of operator< on a std::complex is not a problem: If you need to put it in a std::set, you can still write the lexicographical comparison and provide it as a comparison predicate. There it doesn’t actually matter whether or not the order has any fancy properties, as long as it is total, you get the faster lookup. And when you sort a sequence of complex number you usually have something custom in mind anyway.

Corollary: Don’t implement a general ordering relation for a type, just because some (standard) library container or algorithm requires it. Pass a custom predicate to them instead.

Sadly, the standard library itself seems to follow a different advice. A lot of the types have an overloaded operator <, for example all containers implement a lexicographical comparison that way. For std::string it makes sense, but for std::vector<int>? I don’t think so: It can be useful, convenient, but it is not very meaningful.

I personally follow this rule of thumb:

Guideline: Do not provide a comparison operator for most types.

When in doubt, don’t do it.

This also applies to a lot of programming design in general.

The first time you actually need an ordering, implement it as a predicate and think about whether it is useful enough to be provided generally. For most types you don’t actually ever need an ordering.

Designing Ordering Relations in C++

Okay, so we have a type where we are absolutely sure that we need to provide an ordering: What interface should we provide? The comparison operator overload or a predicate function?

First, let’s get some basic rules out of the way regarding overloaded comparison operators:

Rule: If you overload one of operator<, operator<=, operator>=, operator>, you should also overload all others and so that they implement the same ordering.

This should go without saying. Operators are mathematical constructs with mathematical meaning, they’re not emojis than can mean whatever you want them to mean.

Rule: The comparison operators should implement a total ordering.

If you don’t follow this rule, you might accidentally use your type in a set or sort algorithm without specifying a custom comparison predicate. Your code will still compile, but it will not work, as the algorithms expect a total ordering. So in order to prevent this mistake, the comparison should be total.

Rule: The comparison operators should implement an ordering inducing equality, not just equivalence.

This rule is more subtle: The algorithms don’t care about equality vs equivalence, both work. However, when you write a <= b this should be equivalent to a < b || a == b. And as I’ve argued in the first post, a == b should mean equality not equivalence. So a <= b should induce equality, not just some equivalence.

This also means:

Rule: If your type has overloads of the comparison operators, also overload the equality operations. The equality induced by the comparison operators should match the equality implemented by the equality operations.

If you have implemented a total order using <, you’ve also defined an equality. So there is not really any point in hiding that fact from the user, so you should overload == and != checking that equality. And again, it should go without saying that you should implement the same equality in both operators.

So, the comparison operators should implement a (strict) total ordering, with matching == and !=. However, a type can have multiple total orders:

Rule: The comparison operators should implement the intuitive, obvious total order for your type.

If there isn’t one, don’t overload the comparison operators.

This leaves the predicate function for non-intuitive total orderings and the other ordering relations. But should it be the < equivalent or the <= equivalent?

Rule: Implement a preorder or partial order by writing a named predicate function that returns true if two arguments are less than or equal.

You have no choice: You can’t implement a preorder / partial order with <: it will not allow deducing equivalence. So you have to use <=.

Rule: When implementing a total preorder or a strict weak order, provide a named comparison function that returns true if the first argument is strictly less than the second argument (i.e. the strict weak order).

For a total ordering relation that provides equivalence and not equality (total preorder, strict weak order), you could implement the or < version. However, if you implement < you can directly use the function as a predicate for algorithms requiring a compare.

So, to sum up:

  • the obvious total ordering: overload all comparison operators and equality operations
  • a less obvious total ordering: named predicate implementing <
  • a total preorder / strict weak order: named predicate implementing <
  • a partial order or preorder: named predicate implementing <=

Implementing Ordering Relations in C++

Like with the equivalence relations last time we again need to translate objects into mathematical constructs. And again, this is done by talking about the value of your object, and then implementing an ordering relation on the set of your values.

And this is done like the implementation of an equality function: You compare the value of your object by comparing the salient properties.

The easiest case is a compound type where all you need is a lexicographical comparison of the salient properties: Where with equality you chain the == comparison, with comparison you chain <. Note that you automatically have a total order if all members have a total order.

But again: you should not do that in an overloaded operator unless it is really the obvious, total order. It is a good strategy in the comparison predicate for std::set, however.

Consider a simple pair, for example:

template <typename T, typename U>
struct pair
{
    T first;
    U second;
};

Of course, it is missing around 230 lines to become a std::pair.

The equality is very straightforward:

template <typename T, typename U>
bool operator==(const pair<T, U>& lhs, const pair<T, U>& rhs)
{
    return lhs.first == rhs.first && lhs.second == rhs.second;
}

Here order of comparisons don’t matter, but due to short circuiting you should compare the members that are different most often first. This is not applicable for a generic type such as std::pair though.

For < the order of comparisons is important. It doesn’t really matter for the user too much, but changing the order changes the ordering of the type, so is a breaking change. So with the classical order for a pair we end up with:

template <typename T, typename U>
bool operator<(const pair<T, U>& lhs, const pair<T, U>& rhs)
{
    if (lhs.first != rhs.first)
        // sort by first member if they're not equal
        return lhs.first < rhs.first;
    else
        // sort by second member
        return lhs.second < rhs.second;
}

Note that std::pair implements this simple lexicographical comparison as the comparison overloads. Whether or not this is a good idea is up for debate.

If you have a lot of members writing this manually can be tedious. As a trick you can also use std::tie() to create a std::tuple of references to your members, then use the provided operator< of tuple:

return std::tie(lhs.first, lhs.second) < std::tie(rhs.first, rhs.second);

And if you have members of the same type, you can use the std::lexicographical_compare() algorithm.

If you don’t need a simple lexicographical comparison, things require a bit more manual work. For example, consider the operator< of std::optionaL<T>: It creates a new sorting order where std::nullopt (the empty optional) comes before all other T objects.

And again, whether it should have an operator< can be discussed.

For me, std::optional<T> is an extension of T with an additional null state. So its set of values is the set of all T values and one additional value. If this set was ordered before, the extension should be ordered as well.

For others, std::optional<T> is a container of T that can either have size 0 or 1. Then the set of optional values is the set of an empty container and a set of a container containing one T. There it is less clear that it should be ordered.

The operator< can look like this:

template <typename T>
bool operator<(const optional<T>& lhs, const optional<T>& rhs)
{
    if (!lhs)
        // empty optional less than all non-empty
        return !rhs.empty();
    else if (!rhs)
        // left hand side is never less than an empty optional
        return false;
    else
        // otherwise compare the members
        return lhs.value() < rhs.value();
}

But once you have an operator<, implementing the other ones is straightforward:

bool operator<=(const T& lhs, const T& rhs)
{
    // (lhs ≤ rhs) iff (lhs < rhs or lhs == rhs) 
    // and (lhs == rhs) iff !(lhs < rhs) and !(rhs < lhs)
    return !(rhs < lhs);
}

bool operator>(const T& lhs, const T& rhs)
{
    // (lhs > rhs) iff !(lhs <= rhs) iff rhs < lhs
    return rhs < lhs;
}

bool operator>=(const T& lhs, const T& rhs)
{
    // (lhs >= rhs) iff (lhs > rhs or lhs == rhs),
    // (lhs > rhs) iff (rhs < lhs)
    // and (lhs == rhs) iff !(lhs < rhs) and !(rhs < lhs)
    return !(lhs < rhs);
}

You could also use std::rel_ops to automate that.

… I’m kidding.

Implementing the predicate functions for other orderings is similar. The non-total orderings require a bit more thinking to get the incomparable and equivalence properties correct, but there is no general advice I can give. You must work it out on a case-by-case basis and verify that your ordering fulfills the required axioms.

Conclusion

The comparison operators should only be overloaded if they implement an obvious total ordering inducing equality, not just equivalence. For any other ordering relation implement the < version as a named predicate function.

When in doubt, don’t overload the comparison operators. Just manually use predicates when required by containers or algorithms.

Note that this advice changes slightly once the spaceship operator arrives. We’ll look at that in the next part.

This post was made possible by my Patreon supporters. If you'd like to support me as well, please head over to my Patreon and do so! One dollar per month can make all the difference.