foonathan::blog()

Thoughts from a C++ library developer.

Type safe - Zero overhead utilities for more type safety

Two weeks ago I’ve blogged about using C++’s type system to prevent errors. The post spawned a lot of discussion so I wanted to address some of the responses I’ve got. I’ve also said at the end of the post that I was going to write a library that helps to implement the techniques. The library is now done - type_safe can be found on Github, but please do read on for a discussion about the motivation and feature overview.


Advertisement

Guideline II: Use proper argument types

Let’s talk about the guideline II from the previous post again because that’s the more important one and I kinda glossed over it in the last post. The overall goal is to minimize precondition errors. The most efficient way to do that is to minimize preconditions - the less chances to make errors, the less errors.

Note that this does not mean to artificially widen the contract - like std::vector<T>::at() does with the exception on invalid index instead of operator[]’s UB. This simply means to choose a proper argument type - one that cannot express the invalid value. Then a possible precondition error is a type error and caught by the compiler!

I gave an example, suppose you have the following function:

/// \requires `ptr` must not be null.
void foo(int* ptr)
{
    assert(ptr);
}

foo() has a precondition - you must not pass nullptr. This precondition is documented and there is an assertion to verify it.

Some say: that’s the best way to communicate the precondition.

No. It is not.

The best way to communicate a precondition is with code. Code that needs comments is per definition worse than code that is as clear but doesn’t use comments.

In this case the answer to the problem is simple: Use a reference.

void foo(int& ref);

Now there is no need to document a precondition because a reference can’t be null! You can still technically pass it null by dereferencing a null pointer, but that’s the callers fault. Furthermore you cannot accidentally pass a null pointer or any pointer for that matter. The compiler will complain that a reference is not a pointer, so the caller has to dereference the pointer. Every C++ programmer should be trained to automatically think whenever they write *ptr - Could it be possible that this pointer is null? Do I need to check for it? Do I handle it? This does not happen when they simply write foo(ptr). Thus by changing the type we eliminated a precondition and traded a possible runtime bug with a compile time error.

And so far, people agreed.

But then I gave another example:

/// \requires `i >= 0`.
void foo(int i)
{
    assert(i >= 0);
}

Here foo()’s argument must not be negative. So, following the same guideline, we should change the type in order to prevent that precondition error from ever happening and to ensure that the compiler will remind us of the error instead of a crash at runtime.

What’s the type to represent non-negative integers? Exactly, unsigned:

void foo(unsigned i);

Now you cannot pass negative values and the compiler will complain if we do so.

Except it does not:

int i = 42;
foo(i); // works
i = -37;
foo(i); // works
foo(10); // works
foo(-10); // works

For some bizarre reason, someone decided that it is a good idea to silently and willingly convert every integer to unsigned whenever possible.

Hint: It’s not. It’s really not.

Instead of preventing a possible type error, now the bug is hidden and the function gets called with a gigantic value instead. This - among other issues - with unsigned led to a guideline by Bjarne himself (!) that you should not use unsigned for everyday use.

But: If it’s broken, fix it, don’t just stop using it and pretend it doesn’t exist!

Thankfully C++ did not just inherit C’s mistakes - it also gave us ways to fix those mistakes.

And also many, many more mistakes - but that’s a different blog post.

That’s what I did.

type_safe::integer - a better integer type

The library provides a class template integer<T>. It is a wrapper around some integer type T, but better.

Let’s use that instead of plain, old unsigned:

void foo(ts::integer<unsigned> i);

In this and all other examples the namespace type_safe is aliased to ts.

Okay, using it now:

int i = 42;
foo(i); // error, i is not unsigned
i = -37;
foo(i); // error, i is not unsigned
foo(10); // error, 10 is not unsigned
foo(-10); // error, -10 is not unsigned

foo(10u); // alright, 10u is unsigned
foo(ts::integer<unsigned>(-42)); // haha, nice try
foo(-ts::integer<unsigned>(37)); // of course not (unary minus doesn't exist for unsigned)

Note that we’re talking about compile errors here. This is how unsigned should behave in the first place!

ts::integer<T> only accepts integers of the same signed-ness of T whose size is less than or equal to T. And “accepts” does not just refer to the constructor, no, to everything:

ts::integer<int> a(0); // btw, no default constructor
ts::integer<long long> b(10);
ts::integer<unsigned> c(0u); // have to use "u" suffix

b += a; // alright
a += b; // no, possible lossy conversion

a + b; // alright, result is `ts::integer<long long>`

c += 42; // nope, 42 is not unsigned

a = -1;
if (a < c) // haha, nice try, you may not compare!

It goes without saying that ts::integer<T> is fully constexpr enabled and has zero overhead, right?

In addition to those “sane” conversion the implementation for unsigned ts::integer’s also fixes another problem with unsigned types: Over/underflow of a ts::integer<T> is always undefined behavior. In practice this means that:

ts::integer<unsigned> u(0);
--u;

Is a runtime error in debug mode and if assertions are disabled the compilers are able to perform similar optimizations as with signed integer types. Don’t believe me? See for yourself.

The compilers aren’t quite able to fully optimize it. If you know a better way to signal to the compiler that overflow is UB, tell me!

ts::boolean and ts::floating_point<T>

For completeness the library also provides a ts::boolean type and a ts::floating_point<T>. But these are ““just”” wrappers without dangerous conversion over bool and a floating point type, respectively.

Note that you cannot do arithmetic with ts::boolean or compare a ts::floating_point for equality with operator==().

ts::narrow_cast() and ts::make_(un)signed()

Of course sometimes you want to convert between dangerous types. For that there is ts::narrow_cast():

ts::integer<short> i = ts::narrow_cast<short>(42);
ts::floating_point<float> f = ts::narrow_cast<float>(0.1);

Have you spotted the bug?

0.1 is a double literal, so we cannot assign it to a type safe float directly.

Normally you wouldn’t use narrow_cast() for literals, only for other variables. The type of a literal is easily changed with a suffix.

But 0.1 cannot be expressed in IEEE-754 without some loss. So the conversion from double to float would lose precision. This is checked at runtime in debug mode and results in an error. If you really want to have a possible loss, you have to be extra verbose:

ts::floating_point<float> f(static_cast<float>(0.1));

And if 0.1 is not in a literal:

ts::floating_point<float> f(static_cast<float>(static_cast<double>(d)));

Now, that’s a lot of typing!

Note that ts::narrow_cast() still does not allow conversion between signed and unsigned. For that you have to use the ts::make_(un)signed functions:

ts::integer<unsigned> u();
ts::integer<int> i = ts::make_signed(u);
// likewise with make_unsigned()

Again this checks that the value fits into the target type in debug mode. There is also a ts::abs() whose return type is the corresponding unsigned ts::integer.

ts::constrained_type

Back to the guideline.

With the ts::integer<T>s you can follow it safely without hiding the bug. Once again the compiler will remind you if you try to pass any value that might be negative, forcing you to think.

But there are some constraints on type that cannot be expressed with a built-in type. For those, there is ts::constrained_type:

using non_empty_string = ts::constrained_type<std::string, ts::constraints::non_empty>;

void foo(const non_empty_string& str);

foo() only accepts a std::string that is not empty. This constraint cannot be checked at compile time obviously, but the compiler is happy to remind you that there is some constraint:

foo("Hello world")); // error: const char* is not a non_empty_string
foo(std::string("Hello world")); // error: std::string is not a non_empty_string
foo(non_empty_string("Hello world")); // there ya go

Like before a compile error about a type mismatch hopefully encourages you to think whether that constraint is fulfilled. And if you don’t - no worries, a debug assertion is waiting for you.

You can of course completely customize the constraint verification with a third template parameter.

Because a non_empty_string has a constraint, you cannot modify it directly. There is a get_value() function but it returns a const T&. To modify it, you have to use modify():

auto modifier = str.modify();
modifier.get() += "bar";
modifier.get().clear();
modifier.get() = "foo";
// destructor of modifier checks constraint again

If you like lambdas, you can also use ts::with():

ts::with(str, [](std::string& s)
{
   
});

The Constraint is simply a predicate but it can also do static checks. This is a simple implementation of GSL’s non_null<T*>:

using non_null_ptr = ts::constrained_type<int*, ts::constraints::non_null>;

non_null_ptr p(nullptr); // compilation error

Some constraints cannot be checked or are too expensive to check. For that there is ts::tagged_type:

using owning_ptr = ts::tagged_type<int*, ts::constraints::owner>;

owner isn’t really a predicate, it is just a tag type. This enables a technique Ben Deane calls phantom types.

Guideline I: Use a proper return type

In the last post I’ve also complained about std::string::back(). It is very easy to misuse it and accidentally violate the precondition.

I argued that a better solution would be if the return type was not simply char but std::optional<char>. Then the function can always return something and there is no need for the precondition.

But people complained, I “went overboard” with that and I was - again - artificially widen contracts. I agree that I wide the contract, but not artificially. I simply use a proper return type for a function that sometimes cannot return a value. The precondition is still there - it only moved to one central place: the value() function of the optional.

Using std::optional is once again a different type so the compiler reminds you that there might not be a value there. This is just the general C++ guideline to prefer compile-time errors over runtime errors. C++ gives you the tools to do that, so use them!

Scott Meyers repeatedly said: Make interfaces easy to use correctly and hard to use incorrectly. This is easy to use incorrectly:

char back(const std::string& str);

This is harder to use incorrectly:

std::optional<char> back(const std::string& str);

It is harder to use incorrectly because you can easily call the function without thinking too much, but you cannot easily access the value of the function without thinking too much.

ts::optional<T> and ts::optional_ref<T>

type_safe also provides an optional. It is very similar to the standard version but has a few differences. For example, it does not provide the pointer like access functions. But in addition it is monadic and provides map(),bind() and unwrap(), as well as some other functions.

With those, you do not need to actually call the value() function of the optional and do not run into its precondition there. For example, like std::optional<T> it provides a value_or() function that either returns the value or some fallback value if the optional is empty. But there is also a map() function:

ts::optional<int> opt = ;
ts::optional<char> mapped = opt.map([](int i) { return 'A' + i; });

If opt is empty, mapped is empty as well. Otherwise mapped contains the character 'A' + opt.value(). A more efficient map() that does not return a copy is ts::with():

ts::optional<int> opt = ;
ts::with(opt, [](int& i) { ++i; });

It gets an l-value reference and allows in-place modification of the value of the optional instead of returning a copy. Some functions you might want to use with map() return an optional themselves:

ts::optional<int> opt = ;
ts::optional<ts::optional<char>> mapped = opt.map([](int i) { return i > 26 ? ts::nullopt : 'A' + i; });
// a nested optional isn't nice but there's unwrap():
ts::optional<char> map_unwrap = mapped.unwrap();

unwrap() unwraps a nested optional. If the outer one is empty, the result is empty as well but of the nested type. Otherwise it is the value() of the outer one. The member function bind(f) is equivalent to map(f).unwrap().

The std::variant function provides std::visit(). It calls a Visitor with the type stored in the variant. A ts::visit() for optional exists as well, it is generalization of ts::with() that also calls a function if there is no value stored, passing it ts::nullopt.

There is also ts::optional_ref<T> that models an optional reference. It basically behaves like a pointer - you can even assign it nullptr in addition to nullopt to create the empty state - but has the same interface as ts::optional so you can use the same functions. ts::optional_ref<T> is also useful for arguments where you want a reference that might be null, a pointer may not be the right modelling choice.

In fact, both ts::optional and ts::optional_ref are aliases for the generic ts::basic_optional<StoragePolicy>. The StoragePolicy stores the value, determines when it is invalid, which types are accepted etc. For example you can easily write your optional_int that uses the special value -1 to represent an invalid value, saving you the overhead for the storage + bool the generic ts::optional uses.

Like everything else in type_safe there is no runtime overhead.


Advertisement

Conclusions

C++’s type system is amazing. It is just not amazing for the built-in types. But thankfully it provides the functionality to fix it.

The techniques I’ve shown you do not make C++ like Java with wide contracts and exceptions everywhere. Instead, they make runtime errors type errors languages like Haskell do. Proper type design can completely remove entire classes of errors. The errors are still possible of course but they can only happen after the programmer is reminded by the compiler, making it more unlikely.

Furthermore, given a sufficiently smart compiler - i.e. newer GCC with -O1 - they have zero or even negative overhead. Some of the techniques are drastic and may seem weird. But this is just because that’s not the way low-level C or C++ code is usually written. This is a more ““modern”” way of thinking using functional paradigms. If you want to try it out, check out type_safe.