Operator precedence is broken

24 Jul 2017 by Jonathan

A discussion on Twitter got me thinking about operator precedence. It is a crucial part of most programming languages as it dictates the meaning of expressions.

Interestingly enough, it is practically the same in almost all programming languages, even ones that radical try to be a better alternative for an established language. So apparently operator precedence is a solved problem, right?

Well, I don’t think so. I think operator precedence is fundamentally flawed and could easily be improved.

This blog post is different than my usual ones, as I’m not going to talk specifically about C++. Instead I’m going to share my thoughts on programming language design. Note that I am no compiler dev, I’m just someone who uses programming languages. If you like this post, please let me know and I’ll do more of these.

I’m going to use C++ as an example throughout this post, but this applies to any programming languages with conventional operators.

Consider the following piece of code:

x = a & b + c * d && e ^ f == 7;

How would you react if you read that code?

You would probably blame the person who wrote it.

“Use parentheses!”

“Refactor it out into multiple smaller expressions!”

And that is a reasonable reaction. In fact, this example is taken from the C++ Core Guidelines in a rule about complicated expressions that should be avoided.

Interestingly enough this is not an anti-example for the next rule “If in doubt about operator precedence, parenthesize”.

It’s a common guideline in most languages to parenthesize if the operator precedence isn’t clear. To quote the Core Guidelines again: not everyone has the operator precedence table memorized. And one shouldn’t need to memorize the precedence in order to understand basic expressions.

However, I don’t think the author of bad code is really to blame here. It’s probably rare that someone opens the editor/IDE and thinks “today, I’m just going to abuse operator precedence really hard”.

At least I hope so.

And granted the above example is deliberately extreme, but think of a more reasonable example where you complained about missing parentheses. Maybe it was completely clear for the author that this operator binds stronger than that, so the expression is well-formed?

The operator precedence wasn’t picked at random, there is a certain logic behind it.

Again: At least I hope so.

So it could be expected that someone just intuitively knows the relative precedence of two operators and just didn’t think parentheses would be needed there.

I think the real blame takes the language that allowed him or her to write such an awful expression. It should have prevented writing expressions that are potentially ambiguous to a human reader.

Don’t get me wrong - I’m all in for languages that provide the programmer with as much freedom as possible.

I’m programming in C++ after all.

But there is no benefit to writing unreadable expressions, i.e. there is no reason why it should be allowed.

So what kind of operator precedence leads to unreadable expressions?

Goals of an operator precedence

When is an operator precedence good?

I think there are two goals it should fulfill.

1. Operator precedence should be intuitive

Using operators is a really common operation in any kind of language. They are used by practically everyone - novices to gurus - so it is absolutely crucial to get them right.

If you read an expression like -3*4+22==a()+b[42], anyone should be able to infer what it does. Otherwise, your language isn’t good.

If your language massively deviates from common idioms, you have a problem. Just imagine a language where a + b * c is (a + b) * c! There will be bugs everywhere.

Users of your language should never look at the operator precedence table. If they do, that’s a failed design.

2. Operator precedence should be useful

If there is a common usage and interaction of certain operators, the precedence should “just work”.

It is simply not beneficial if you have to use parentheses all the time. They just clutter the code and irritates someone who reads it.

The C programming language - and thus many derived languages - has a great example of “bad precedence” that annoys me anytime I use it. The precedence of the binary bitwise operators (&, |, …) is lower than that of the comparison operators (== or ‘<`).

I don’t know why and I hate the decision.

The reason is simple: Consider you have an enum of flags - each enumerator has a single bit set and you store a combination of flags in an integer by setting the bits. So you’d do this to set a flag:

unsigned flags;
flags |= enable_foo; // bitwise or to set

And you’d do this to check whether a flag is set:

// if there is any bit set in both flags and enable_foo,
// enable_foo is set
if (flags & enable_foo != 0)
    …

Except this does the wrong thing, as it is parsed as flags & (enable_foo != 0) which is flags & true.

Of course, you shouldn’t just use this kind of flag set in C++ directly. Check out my blog post about a type-safe flag set that catches operator misuse for more.

Another popular example is C++’s pointer-to-member dereference operator .*.

If you have no idea what I’m talking about, you haven’t missed anything.

If you have a pointer to a member variable mptr and want to dereference it given an obj, you write:

auto value = obj.*mptr;

However, if mptr is a pointer to a member function, you’d have to write:

auto result = (obj.*mptr)(args);

Yes, that’s right obj.*mptr(args) just won’t work. This is especially stupid as you can’t really do anything with the result of obj.*mptr - except call it! You can’t even store it in a variable.

These operator precedences definitely aren’t useful, so it should have been different.

A good operator precedence is impossible

We’ve identified the two goals of a good operator precedence: It should be intuitive and it should be useful.

But there’s a problem: these two goals are in conflict with each other.

Consider the binary & precedence: If we were to fix it by parsing flags & enable_foo != 0 as (flags & enable_foo) != 0, we would deviate from the common norm. While we would have created something more useful, it would also be unintuitive.

Or maybe it would be intuitive; I don’t think the C & precedence is intuitive.

Furthermore, the realm of what’s intuitive varies from person to person,

For example, it’s clear for me that a || b && c is a || (b && c) and not (a || b) && c, as logical and is written as multiplication in logic and logical or as a sum. However, given the fact that there’s a common C++ compiler warning if you write a || b && c without parentheses, it doesn’t seem to be general knowledge…

So what is universally considered intuitive?

Mathematical order of operations: * and / bind stronger than + and -. I think everyone is with me here.
Unary operators bind stronger than binary ones. It would be just insane if a + -b[42] + c would be interpreted as (a + -b)([42] + c). However, we’re - already! - reaching a grey zone here, as shown with the pointer to member function example, where we’d want obj.*ptr() to be (obj.*ptr)(). On the other hand: it’s a pointer to member, the only people who ever use those are implementers of things like std::function or std::invoke, so it’s fine to sacrifice operator .* and its even more insane cousin operator ->*.
… That’s it actually. Everything else is potentially ambiguous.

However, we can’t really assign an operator precedence based on that, we have to pick a relative ordering for all operators.

Or… do we?

Partially ordered operators

Instead of creating a totally ordered operator precedence, we don’t actually need one. It doesn’t make sense to ask “which binds stronger & or /?”, as you rarely need to mix those two. If we try to answer those questions - as most language do - we can’t really give an intuitive answer - simply because the situation is so abstract, no one has an intuitive answer.

And even for operators that are used together - like && and || - it is difficult to give them a relative precedence while keeping it intuitive. So instead of picking a side, we can just pick none: Let them have the same precedence and make it an error to mix them without parentheses.

And then there are operators where chaining simply is a stupid to chain them.

What does a == b == c do, for example. It doesn’t check whether all three are equal.

And what does 0 < a < 5 do?

You don’t actually want what those expressions do as they don’t do what you think. Writing those expression is just only not useful, but actively dangerous. So it should be forbidden to write those expressions.

But what if you want to write a & b / c?

What if you want to write a && b || c?

And what if you truly want the behavior of a == b < c?

Then you use parentheses.

A carefully designed operator precedence enforces the common guideline of “use parentheses when it is not intuitive”. It is now impossible to write unclear expressions as the compiler will simply reject them.

Following this kind of idea, we get:

The final operator precedence

If we just take the most common operators, I identify the following “categories” of operators:

Logical operators: &&, ||, !
Comparison operators: ==, !=, <, <=¸ …
Mathematical operators: binary/unary + and -, *, and /.
Bitwise operators: ~, &, |, ^, << and >>
Other unary operators like function call, array subscript or member access

It makes sense to assign them the following relative precedence:

unary operators > mathematical/bitwise operators > comparison operators > logical operators

Note that we did have to make a few additional assumptions beyond the few I considered to be intuitive. In particular, a & b == c does not do what C does. But I think this kind of precedence is still reasonable.

The mathematical/bitwise operators have the same precedence, but it is actually an error to mix the two categories as they have no relative precedence to each other. Furthermore, unary ! has the strongest precedence but it only expects a unary expression and things like !a + b is not allowed.

Inside the categories the relative precedence of the operators is as follows:

logical operators: ! > &&/||, but not mixed && and || chains
comparison operators: no chaining at all
mathematical operators: unary +/- > *// > +/-, with the usual associativity
bitwise operators: unary ~ before the binary operators, but again no mixed chaining of &, | and ^ and no chaining of the shift operators
unary operators: just as usual

What about assignment?

I think making assignment an expression was a mistake, it should be a statement. As such, I left it out from the operator precedence.

Then the following expressions are all well-formed:

a * b + c == foo & a
a && (!b || c)
array[a] + 32 < ~a | b

But these ones aren’t:

a & b + c
a << b + 1

Conclusion

If we use such an operator precedence in a language, we get a language where the compiler rejects expressions where you should have used parentheses. We’ve thus enforced the common guideline of using parentheses to make operators readable.

I couldn’t find a language that actually does this, the closest is Pony where it is illegal to mix any kind of operators without parentheses. However, that’s not a particularly useful operator precedence.

While statically enforcing guidelines in all cases is usually not a good idea, they are guidelines, after all, I think it is worth here. At worst, you’d have to write parentheses were you wouldn’t have otherwise.

And I think that’s a good thing.

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.