foonathan::blog()

Thoughts from a C++ library developer.

Prevent precondition errors with the C++ type system

In the previous part of the error handling series I’ve talked about assertions and wrote a debug assert library that provides flexible assertions.

Assertions are a useful tool to check preconditions of functions - but proper type design can prevent situations where assertions are needed. C++ has a great type system, let’s use it to our advantage.

At the recent CppCon Ben Deane gave a - as far as I’ve heard - great talk about type design. I sadly didn’t attend the conference and his video isn’t released yet but according to the slides there is some overlapping between his talk and what I’m going to say. But because I’ve planned this posts for weeks and even made the entire series just for it I decided to post it anyway. After all: some things cannot be said often enough.

Also I’m going to focus explicitly about type design for error handling, while his talk seems to be more generic.


Advertisement

Motivation

I’m working on standardese, a C++ documentation generator. As is the nature of those things I have to deal with a lot of strings there. In particular a common task I have to do is to erase whitespace at the end of the string. Because this can be done in a very simple way and the definition of “whitespace” varies from situation to situation I didn’t bother to write a separate function for that.

In hindsight I really should have.

I’m using code like this:

while (is_whitespace(str.back())
    str.pop_back();

I’ll write the two line, commit it, push and after the usual amount of waiting for CI I’ll get a mail telling me that the Windows build has failed. I’m puzzled - it worked on my machine and all Linux and MacOS builds! - and look through the log: test execution has apparently timed out.

Now I’m fed up, reboot into Windows and build the project there. Running the tests give me the wonderfully designed debug assertions failure dialog.

The one where they say “Retry” means “Debug”.

Looking at the error message I facepalm and commit the fix:

while (!str.empty() && is_whitespace(str.back())
    str.pop_back();

Sometimes the string was empty. libstdc++ don’t have assertions for that enabled by default and it just so happened to work as expected. But MSVC has assertions and notices it.

I made this very error 3 times - I really should have written the function.

There are a couple things that went bad there: I didn’t follow DRY, libstdc++ doesn’t verify preconditions by default, Appveyor doesn’t like graphical assertion dialogs and MSVC isn’t available on Linux.

But I’d argue that the main fault is in the design of std::string::back(). If it was properly designed the code wouldn’t compile and remind me of the fact that the string might be empty, saving me 15 minutes and a reboot into Windows.

How? With the help of the type system.

A solution

The function in question has a signature that looks simplified like this:

char& back();

It returns the last character of the string. If the string is empty, there is no last character and thus it is UB to call it anyway. How do you know that? It seems obvious if you think about it. I mean: which char should it return in case of an empty string? There is not really an “invalid” char, so it cannot return any.

Well, there is \0 but it can be the last character of the std::string, so you could not differentiate the two situations.

But I didn’t think about it. I was busy thinking about this complicated comment parsing algorithm and fed up with the fact some people put trailing whitespace in their comments which break the subsequent markdown parsing!

back() has a narrow contract - a precondition. Functions with narrow contract are without doubt more difficult to work with then functions with a wide contract. It is thus a feasible goal to make as few contracts narrow as possible.

In this particular function the problem is that back() does not have a valid character to return in case of an empty string. But there is one C++17 addition that can help this poor function: std::optional:

std::optional<char> back();

A std::optional can either contain a value or no value. It allows an invalid value for types where very value is valid. If the string isn’t empty, back() returns an optional that contains the last character. But if the string is empty, it can return a null optional. We’ve properly modelled the function so that we do not need the precondition anymore.

Note that we’ve also lost the ability to use back() as an l-value, because std::optional<T&> is not allowed. std::optional really isn’t great, more on that later.

Assuming std::string::back() has this signature. Now I’m again concentrated on my comment parsing code and write the quick two-liner to erase trailing whitespace:

while (is_whitespace(str.back())
    str.pop_back();

is_whitespace() takes a char but back() returns std::optional<char>, so I’ll get a compile error - on my machine, immediately. The compiler has caught a possible bug for me, statically, with only the type system! I’m automatically reminded that the string might be empty and have to do extra work to get the character.

Of course I can still mess it up - because std::optional really isn’t designed for this purpose:

while (is_whitespace(*str.back())

This has the exact same behavior and will probably yield a debug assertion on MSVC. std::optional<T>::operator* must not be called on a null optional and returns the contained value. Slightly better would be:

while (is_whitespace(str.back().value())

std::optional<T>::value() is at least defined to throw an exception on an empty optional, so it will at least reliably fail at runtime. But both solutions bring absolutely no benefit over the code with the same signature. These member functions are so bad and make holes into the wonderful abstractions, they shouldn’t exist in the first place! Instead, there should be more high level functions that make it unnecessary to actually query the value. And for the few cases where it might be needed it should be a non-member functions with a long name that stands out and make you aware that you are doing something bad - and not a single star!

std::optional really isn’t great. It was simple designed as an alternative to std::unique_ptr<T> that doesn’t allocate memory, nothing more, nothing less. It is a pointer type, not the Maybe monad it could have been. This really makes it unusable for many purposes where you want the monad> - like this one.

A much better solution would be this one:

while (is_whitespace(str.back().value_or('\0'))

std::optional<T>::value_or() either returns the value or the alternative. In this case a null optional returns the null character, which just so happens to be a perfect value to terminate the loop. But of course there isn’t always a proper invalid value. So the best solution would be the following: Change the signature of is_whitespace() to accept a std::optional<char>.

Guideline I: Use a proper return type

There are a many functions how either return something or must not be called. back()/front() are examples of that. For those consider designing them so that they return an optional type like std::optional<T>. Then you don’t need to do a precondition check and the type system itself helps preventing errors and makes it easier for the user to detect and handle the error.

Of course you cannot use std::optional<T> everywhere where you might run into an error. Some errors aren’t precondition errors. In those situations either throw an exception or use something similar to the proposed std::expected<T, E> that can either return a valid value or an error type.

But for the functions that return something and must not be called in an invalid state, consider returning an optional type.

Parameter preconditions

We’ve dealt with preconditions for invalid states but most preconditions are on the parameter. But by changing the parameter type you can easily get rid of the precondition as well.

For example, consider this function:

void foo(T* ptr)
{
    assert(ptr);
    
}

Change the signature to:

void foo(T& ref);

Now you cannot pass a null pointer value anymore and if you do, it is the callers fault for doing UB by dereferencing it.

This also works with more than just pointers:

void foo(int value)
{
    assert(value >= 0);
    
}

Change the signature to:

void foo(unsigned value);

Now you cannot pass a negative value without doing an underflow. C++ sadly inherited the implicit conversion from signed to unsigned types from C, so the solution isn’t perfect but it documents the intent.

Guideline II: Use proper argument types

Choose your argument types so that preconditions can be eliminated and instead shown in the code directly. If you have a pointer that must not be null? Pass a reference. A integer that must not be negative? Make it unsigned. A integer that can only have a certain, named set of values? Make it an enumeration.

You can even go so far and write yourself a general wrapper type whose - explicit! - constructor asserts that the “raw” value has a certain value , like so:

class non_empty_string
{
public:
    explicit non_empty_string(std::string str)
    : str_(std::move(str))
    {
        assert(!str_.empty());
    }

    std::string get() const
    {
        return str_;
    }

     // other functions you might want

private:
    std::string str_;
};

It is very easy to generalize this little wrapper. Using it expresses intent and makes one central place to check for the validity. You can then also easily differentiate between already checked values and possible invalid values and make the preconditions obvious without documentation.

Of course this technique isn’t always possible. Sometimes you need a certain type by convention. In addition, using it everywhere can be overkill as well: If there is just one place where you require a certain preconditions, there is not much need to write the whole boilerplate.


Advertisement

Conclusion

C++ type system is powerful enough to help you catch errors.

Proper function design can remove many preconditions from the function itself and instead put them into one centralized place. Choose semantic argument types that can express the preconditions naturally and optional return types if the function sometimes cannot return a valid value.

While writing this post I’ve I yet again came up with a library idea like in the last post. I might write a small library to enable an easy use of “semantic types” that express preconditions in a natural way. But I didn’t want to delay this post further, so I haven’t done it (yet).