Implementation Challenge flag_set: Type-safe, hard to misuse bitmask

16 Mar 2017 by Jonathan

Sometimes when writing an API you need to pass various flags to a function. For example, when opening a file you can pass information like whether or not the file is opened for reading, writing, binary, write at the end etc. And often those flags can be combined arbitrarily.

Usually you’d implement that by using a bitmask: Each flag is a bit in an integer, they can be set/reset and toggled with bitwise operations. However, the naive implementation isn’t very good: I’ll explain why and show you how to do it better.

Bitmask

A bitmask is usually implemented like this:

enum flags
{
    a = 1,
    b = 2,
    c = 4,
    d = 8,
};

int some_flags = a | b;
some_flags &= ~b; // clear b
some_flags |= d; // set c

An enum is used to define the actual flag values. Each flag is represented by one bit, so the enumerators are assigned powers of two. And you can use bitwise operations directly with enums, so an integer with bit 1 and 2 set here is flag a and flag b.

However, this approach has multiple drawbacks. For starters, classical C enums aren’t scoped and convert to an int every chance they’ll get. Also after you’ve combined two flags, you don’t have an object of type flags anymore, but an int, so you’ll loose type safety.

We can fix those problems by using C++11’s enum class. But because this prevents conversion to the underlying integer type, this also prevents using the bitwise operators. We’d have to overload all of them individually:

flags operator~(const flags& f)
{
    return flags(~static_cast<int>(f));
}

flags operator|(const flags& a, const flags& b)
{
    return flags(static_cast<int>(a) | static_cast<flags>(b));
}

…

Now a combination of flags is an object of type flags, and not an int. The downside is a lot of work each time you want to define some flags. And this approach still isn’t perfect:

You still have to manually give each enumerator a different power of two. This is tedious manual work and it is easy to do a copy-paste error.

But more importantly, have you ever run into an error like this?

Gotcha of the day: To set bits a AND b you don't use `a & b`, but `a | b`, i.e. a OR b.
— Jonathan Müller (@foonathan) March 12, 2017

Bitwise operations aren’t very intuitive. It would be nice if there was a better API to set a flag or if it would somehow be possible to prevent these kinds of misuse.

So let’s do exactly that.

The general idea

As plain old C enums aren’t very safe, we want to use an enum class, but then we need to overload the operators. This is too much work, so they have to be generated automatically for enums we want to use as flags.

And when generating the operators with some kind of magic, we can think a little bit more out of the box. There is no need to return the enum directly from the bitwise operators, in fact we shouldn’t. If we return some kind of different type to represent a combination of multiple flags, we can write functions that should only accept one flag, and functions that can accept a combination of flags and the compiler will remind us if we make a mistake.

So let’s have a flag container, a flag_set. This type stores which flags are set and which aren’t. Like the enum itself, it can store that in an integer, where each bit represents one flag.

But how can we prevent accidental misuse?

For that, we have to take a step back and look at the bigger picture. As this stackoverflow answer points out, these are the operations you’d want to do:

Set a bit by writing set |= a
Clear/reset a bit by writing set &= ~a
Toggle a bit by writing set ^= a
Check for a bit by writing (set & a) != 0

a here is an integer where all bits are 0 except the given 0, i.e. some power of two representing the flag.

What you’ll notice is this: Reset is the only operation where you’ll use the complement operator, all others don’t have one. This is still true if you want to do this for two bits a and b:

Set by writing set |= a | b
Clear/reset by writing set &= ~(a | b) or set &= ~a & ~b (deMorgan’s law)
Toggle by writing set ^= a | b
Check by writing (set & (a | b) != 0

So to reset multiple you & the complements. It would be an error, however, to write a & b, as this would always be 0 for two individual, different flags.

With that we can identify two kinds of concepts: A flag combination and a flag mask. A flag combination is either an individual enumerator or multiple |ed together. You can use a flag combination to set, toggle and check for flags. A flag mask is a complemented flag combination. You can & them together and use it to clear flags.

With that in mind we can define two different types flag_combo and flag_mask. Like flag_set they also are containers of flags, but they have semantic information. The operator&= of flag_set can then only be overloaded for taking a flag_mask, so code like set &= a won’t compile, making it impossible to make that mistake.

But what if you truly want to write set &= a? Let’s look at the semantic meaning of “misusing” the operators:

set |= ~a - set everything except a
set &= a - clear everything except a
set ^= ~a - toggle everything except a
(set & ~a) != 0 - check for everything except a

So swapping the concepts around is useful if you have many flags and want to do something for all of them except one (or few). This is reasonable, so it should be allowed. It is not the normal behavior, however, so it should be more explicit.

We can easily write a function combo() that takes a mask and returns the appropriate combination, and mask() that does the opposite. Then the above behavior is still possible it just requires set &= mask(a).

Implementation

`flag_set_impl`

All three types flag_set, flag_combo and flag_mask basically have the same implementation. All three need to store multiple flags as bits in an integer.

So it makes sense to outsource that in a common class:

template <typename Enum, typename Tag = void>
class flag_set_impl
{
public:
    using traits   = flag_set_traits<Enum>;
    using int_type = typename select_flag_set_int<traits::size()>::type;

    …

private:
    static constexpr int_type mask(const Enum& e)
    {
        return int_type(int_type(1u) << static_cast<std::size_t>(e));
    }

    explicit constexpr flag_set_impl(int_type bits) : bits_(bits)
    {
    }

    int_type bits_;
};

Ignore the flag_set_traits business, I’ll get to it later.

As the three types share a common behavior, but it is very important that they are three distinct types, the flag_set_impl has a Tag parameter. This is just a dummy, but two instantiations with different types there are two different types, which allows overloading etc.

This is a workaround for missing strong typedefs.

We’ll store the bits in an integer, select_flag_set_int gives us that integer. It is the smallest unsigned integer type that has at least that many bits. The implementation just uses specializations, nothing too interesting.

This approach doesn’t work for enums with more than 64 flags, but I’ll solve that when it comes up.

One of the other problems I wanted to prevent is making a mistake when assigning the values to the enum flags. It can be prevented by simply keeping the default values. But then instead of being the corresponding mask directly, it is the index of the bit. The mask is easily created by shifting 1 the right number of times, which is what mask() does.

Mask here isn’t flag mask, just the integer that has the other bits “masked”.

static constexpr flag_set_impl all_set()
{
    return flag_set_impl(int_type((int_type(1) << traits::size()) - int_type(1)));
}
static constexpr flag_set_impl none_set()
{
    return flag_set_impl(int_type(0));
}

explicit constexpr flag_set_impl(const Enum& e) : bits_(mask(e))
{
}
template <typename Tag2>
explicit constexpr flag_set_impl(const flag_set_impl<Enum, Tag2>& other)
: bits_(other.bits_)
{
}

We’ll add two named constructors. One returns a flag_set_impl where no flags are set, one where all of them are. The second is more interesting: we can’t return the maximum value of the integer directly, as we might not use all bits of them directly. If the upper bits are 1s all_set() wouldn’t be equal to a | b | ... , as their upper bits are 0s. So we’ll shift 1 one more than we’ll have flags and subtract 1. This works and is works even if the enum uses all bits as unsigned overflow is well-defined.

We’ll also add two regular constructors, which aren’t interesting, as long as they are explicit.

constexpr flag_set_impl set(const Enum& e) const
{
    return flag_set_impl(bits_ | mask(e));
}
constexpr flag_set_impl reset(const Enum& e) const
{
    return flag_set_impl(bits_ & ~mask(e));
}
constexpr flag_set_impl toggle(const Enum& e) const
{
    return flag_set_impl(bits_ ^ mask(e));
}

Next are the important member functions to set/clear/toggle a single bit. They’re all straightforward and make use of the private constructor taking int_type. Note that they aren’t doing it in-place, rather they return a new flag_set_impl allowing them to work with C++11 constexpr rules.

Other member functions not shown are a toggle_all(), to_int() and is_set(), as well as bitwise_or(), bitwise_and() and bitwise_xor(). They’re all constexpr and not in-place and simply forward to the corresponding bitwise operations.

Note that the entire interface of this class is an implementation detail.

`flag_combo` and `flag_mask`

We can then create our two semantic flag containers:

template <typename Enum>
using flag_combo = flag_set_impl<Enum, struct combo_tag>;

template <typename Enum>
using flag_mask = flag_set_impl<Enum, struct mask_tag>;

As tag type we use an on the fly struct declaration, as it really isn’t important.

The only thing the user should now about are the bitwise operations, we overload them like this:

We can | two flag_combo objects as well as a combo with an enumerator, result is a flag_combo
We can & two flag_mask objects yielding a mask.
We can ~ a flag_combo or an enumerator yielding a mask.
We can ~ a flag_mask yielding a combo.
We can also compare two masks/combos for equality as well as a combo with an enumerator.

Implementation is very straightforward with the given interface as are the mask() and combo() conversions.

`flag_set`

flag_set is the important type for the user, it shouldn’t worry too much about the other ones. It uses flag_set_impl as a member and all functions simply forward to it.

flag_set provides the straightforward named member functions: set(),reset(),toggle() as well as set_all(),reset_all() and toggle_all(). Unlike flag_set_impl they work in-place as that’s more convenient for the user and set() also has a bool value overload.

It can also be created from a flag combination (i.e. flag_combo or enumerator) as well as assigned to:

template <typename FlagCombo, typename = detail::enable_flag_combo<FlagCombo, Enum>>
constexpr flag_set(const FlagCombo& combo) noexcept : flags_(combo)
{
}

detail::enable_flag_combo<FlagCombo, Enum> is a convenience alias for typename std::enable_if<is_flag_combo<T, Enum>::value>::type, and is_flag_combo is:

template <typename T, typename Enum>
struct is_flag_combo : std::false_type
{
};

template <typename Enum>
struct is_flag_combo<Enum, Enum> : flag_set_traits<Enum>
{
};

template <typename Enum>
struct is_flag_combo<flag_combo<Enum>, Enum> : flag_set_traits<Enum>
{
};

I’ll get back to the traits, otherwise it simply checks whether the argument is either the enum directly or a flag_combo<Enum>. So simple SFINAE ensures that the conversion only works for a | b and not ~a.

flag_set also provides the compound bitwise operations, |= and ^= are constrained like the constructor, &= requires a flag_mask, catching a potential mistake as I wanted.

A little bit more interesting are the non-compound operators. We can use identical overloads for operator|, operator^ and operator&, each returning the new flag_set, but then we’d miss one: using operator& to check whether bits are set. This operator& takes a flag combination not a mask and it also should return bool.

But this is trivial to add as a flag combination and a flag masks are two distinct types. Unlike other implementations I thus can get rid of the conversion to bool flag_set would need otherwise.

Automatically generating the overloads for the `enum`

We’ve done everything except one last piece is missing: There are still no bitwise operations for the enum directly, all we could overload are the ones taking at least one user-defined type.

flag_set_impl also needs to know how many flags are in an enum, in order to select the integer type and implement the all_set() constructor.

We can solve two problems at once by introducing the flag_set_traits. This is a class template that can be specialized for your own types, i.e. enums. It must provide a static constexpr function size() that returns the number of flags in the enum, used by the flag_set_impl.

And it can also be used to “generate” the bitwise operations. We can’t overload them directly, as we don’t know the type of the enum yet. So all we can do is write them as templates in a global scope.

Global scope is needed because ADL doesn’t help us here, as we don’t know the namespace of the flag.

But then every type would suddenly have an operator~, which could be a better match then the one they actually provide!

This is clearly a bad idea, so instead we can constrain the templates. We can use SFINAE to enable them only if the type is an enum with specialized flag_set_traits. Then they only apply where we actually want them. Detecting a specialization isn’t hard either, we can simply require that every specialization inherits from std::true_type and check flag_set_traits<Enum>::value.

Now this still isn’t a nice solution - it is still a global templated operator, but there aren’t nice solutions. The only other one besides “do it manually” is with a macro.

With that technique, we can add the missing missing operators:

template <typename Enum, typename = type_safe::detail::enable_flag<Enum>>
constexpr type_safe::flag_mask<Enum> operator~(const Enum& e) noexcept
{
    return type_safe::flag_mask<Enum>::all_set().reset(e);
}

template <typename Enum, typename = type_safe::detail::enable_flag<Enum>>
constexpr type_safe::flag_combo<Enum> operator|(const Enum& a, const Enum& b) noexcept
{
    return type_safe::flag_combo<Enum>(a) | b;
}

We need to create a mask when building the complement of a flag, and a combination when we or two together.

Automatically using a correct `flag_set_traits`

The approach with the flag_set_traits works and is non-intrusive. It is a little bit ugly, however: When you define your enum you’ll have to close the namespace, open the namespace of the flag_set_traits, specialize it, and then open the original one again, if you need to add anything else.

It would be better if the default flag_set_traits specialization would work on its own. This can be done as well, on the cost of making it intrusive. The default flag_set_traits can check whether the argument is an enum and whether it has a special enumerator, i.e. _flag_set_size. If that’s the case it inherits from std::true_type and uses _flag_set_size as the return value for size(), else it inherits from std::false_type.

Conclusion

We’ve now created a way to implement flags simply by writing the following code:

enum class flags
{
    a,
    b,
    c,
    …
    _flag_set_size
};

There is no need to assign powers of two, no need to use a macro or overload operators. It just works out of the box.

Furthermore it uses the type system to give the bitwise operations semantic information, so that the compiler can check common mistakes when misusing the operators. But unless the user deliberately want to make the “mistake”, it doesn’t need to care, as the use of the types are hidden away.

The full implementation is part of my type_safe library, and can be found here.

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.

Bitmask

The general idea

Implementation

flag_set_impl

flag_combo and flag_mask

flag_set

Automatically generating the overloads for the enum