Implementation Challenge flag_set: Type-safe, hard to misuse bitmask
Sometimes when writing an API you need to pass various flags to a function. For example, when opening a file you can pass information like whether or not the file is opened for reading, writing, binary, write at the end etc. And often those flags can be combined arbitrarily.
Usually you’d implement that by using a bitmask: Each flag is a bit in an integer, they can be set/reset and toggled with bitwise operations. However, the naive implementation isn’t very good: I’ll explain why and show you how to do it better.
Bitmask
A bitmask is usually implemented like this:
enum flags
{
a = 1,
b = 2,
c = 4,
d = 8,
};
int some_flags = a | b;
some_flags &= ~b; // clear b
some_flags |= d; // set c
An enum
is used to define the actual flag values.
Each flag is represented by one bit,
so the enumerators are assigned powers of two.
And you can use bitwise operations directly with enum
s,
so an integer with bit 1 and 2 set here is flag a
and flag b
.
However, this approach has multiple drawbacks.
For starters, classical C enum
s aren’t scoped and convert to an int
every chance they’ll get.
Also after you’ve combined two flags, you don’t have an object of type flags
anymore,
but an int
, so you’ll loose type safety.
We can fix those problems by using C++11’s enum class
.
But because this prevents conversion to the underlying integer type,
this also prevents using the bitwise operators.
We’d have to overload all of them individually:
flags operator~(const flags& f)
{
return flags(~static_cast<int>(f));
}
flags operator|(const flags& a, const flags& b)
{
return flags(static_cast<int>(a) | static_cast<flags>(b));
}
…
Now a combination of flags is an object of type flags
, and not an int
.
The downside is a lot of work each time you want to define some flags.
And this approach still isn’t perfect:
You still have to manually give each enumerator a different power of two. This is tedious manual work and it is easy to do a copy-paste error.
But more importantly, have you ever run into an error like this?
Bitwise operations aren’t very intuitive. It would be nice if there was a better API to set a flag or if it would somehow be possible to prevent these kinds of misuse.
So let’s do exactly that.
The general idea
As plain old C enum
s aren’t very safe, we want to use an enum class
,
but then we need to overload the operators.
This is too much work, so they have to be generated automatically
for enum
s we want to use as flags.
And when generating the operators with some kind of magic,
we can think a little bit more out of the box.
There is no need to return the enum
directly from the bitwise operators,
in fact we shouldn’t.
If we return some kind of different type to represent a combination of multiple flags,
we can write functions that should only accept one flag,
and functions that can accept a combination of flags
and the compiler will remind us if we make a mistake.
So let’s have a flag container, a flag_set
.
This type stores which flags are set and which aren’t.
Like the enum
itself, it can store that in an integer,
where each bit represents one flag.
But how can we prevent accidental misuse?
For that, we have to take a step back and look at the bigger picture. As this stackoverflow answer points out, these are the operations you’d want to do:
- Set a bit by writing
set |= a
- Clear/reset a bit by writing
set &= ~a
- Toggle a bit by writing
set ^= a
- Check for a bit by writing
(set & a) != 0
a
here is an integer where all bits are0
except the given0
, i.e. some power of two representing the flag.
What you’ll notice is this:
Reset is the only operation where you’ll use the complement operator,
all others don’t have one.
This is still true if you want to do this for two bits a
and b
:
- Set by writing
set |= a | b
- Clear/reset by writing
set &= ~(a | b)
orset &= ~a & ~b
(deMorgan’s law) - Toggle by writing
set ^= a | b
- Check by writing
(set & (a | b) != 0
So to reset multiple you &
the complements.
It would be an error, however, to write a & b
,
as this would always be 0
for two individual, different flags.
With that we can identify two kinds of concepts:
A flag combination and a flag mask.
A flag combination is either an individual enumerator or multiple |
ed together.
You can use a flag combination to set, toggle and check for flags.
A flag mask is a complemented flag combination.
You can &
them together and use it to clear flags.
With that in mind we can define two different types flag_combo
and flag_mask
.
Like flag_set
they also are containers of flags,
but they have semantic information.
The operator&=
of flag_set
can then only be overloaded for taking a flag_mask
,
so code like set &= a
won’t compile,
making it impossible to make that mistake.
But what if you truly want to write set &= a
?
Let’s look at the semantic meaning of “misusing” the operators:
set |= ~a
- set everything excepta
set &= a
- clear everything excepta
set ^= ~a
- toggle everything excepta
(set & ~a) != 0
- check for everything excepta
So swapping the concepts around is useful if you have many flags and want to do something for all of them except one (or few). This is reasonable, so it should be allowed. It is not the normal behavior, however, so it should be more explicit.
We can easily write a function combo()
that takes a mask and returns the appropriate combination,
and mask()
that does the opposite.
Then the above behavior is still possible it just requires set &= mask(a)
.
Implementation
flag_set_impl
All three types flag_set
, flag_combo
and flag_mask
basically have the same implementation.
All three need to store multiple flags as bits in an integer.
So it makes sense to outsource that in a common class:
template <typename Enum, typename Tag = void>
class flag_set_impl
{
public:
using traits = flag_set_traits<Enum>;
using int_type = typename select_flag_set_int<traits::size()>::type;
…
private:
static constexpr int_type mask(const Enum& e)
{
return int_type(int_type(1u) << static_cast<std::size_t>(e));
}
explicit constexpr flag_set_impl(int_type bits) : bits_(bits)
{
}
int_type bits_;
};
Ignore the
flag_set_traits
business, I’ll get to it later.
As the three types share a common behavior, but it is very important that they are three distinct types,
the flag_set_impl
has a Tag
parameter.
This is just a dummy, but two instantiations with different types there are two different types,
which allows overloading etc.
This is a workaround for missing strong typedefs.
We’ll store the bits in an integer, select_flag_set_int
gives us that integer.
It is the smallest unsigned
integer type that has at least that many bits.
The implementation just uses specializations, nothing too interesting.
This approach doesn’t work for
enum
s with more than 64 flags, but I’ll solve that when it comes up.
One of the other problems I wanted to prevent is making a mistake when assigning the values to the enum
flags.
It can be prevented by simply keeping the default values.
But then instead of being the corresponding mask directly,
it is the index of the bit.
The mask is easily created by shifting 1
the right number of times,
which is what mask()
does.
Mask here isn’t flag mask, just the integer that has the other bits “masked”.
static constexpr flag_set_impl all_set()
{
return flag_set_impl(int_type((int_type(1) << traits::size()) - int_type(1)));
}
static constexpr flag_set_impl none_set()
{
return flag_set_impl(int_type(0));
}
explicit constexpr flag_set_impl(const Enum& e) : bits_(mask(e))
{
}
template <typename Tag2>
explicit constexpr flag_set_impl(const flag_set_impl<Enum, Tag2>& other)
: bits_(other.bits_)
{
}
We’ll add two named constructors.
One returns a flag_set_impl
where no flags are set, one where all of them are.
The second is more interesting: we can’t return the maximum value of the integer directly,
as we might not use all bits of them directly.
If the upper bits are 1
s all_set()
wouldn’t be equal to a | b | ...
,
as their upper bits are 0
s.
So we’ll shift 1
one more than we’ll have flags and subtract 1
.
This works and is works even if the enum
uses all bits as unsigned
overflow is well-defined.
We’ll also add two regular constructors, which aren’t interesting,
as long as they are explicit
.
constexpr flag_set_impl set(const Enum& e) const
{
return flag_set_impl(bits_ | mask(e));
}
constexpr flag_set_impl reset(const Enum& e) const
{
return flag_set_impl(bits_ & ~mask(e));
}
constexpr flag_set_impl toggle(const Enum& e) const
{
return flag_set_impl(bits_ ^ mask(e));
}
Next are the important member functions to set/clear/toggle a single bit.
They’re all straightforward and make use of the private
constructor taking int_type
.
Note that they aren’t doing it in-place,
rather they return a new flag_set_impl
allowing them to work with C++11 constexpr
rules.
Other member functions not shown are a toggle_all()
, to_int()
and is_set()
,
as well as bitwise_or()
, bitwise_and()
and bitwise_xor()
.
They’re all constexpr
and not in-place and simply forward to the corresponding bitwise operations.
Note that the entire interface of this class is an implementation detail.
flag_combo
and flag_mask
We can then create our two semantic flag containers:
template <typename Enum>
using flag_combo = flag_set_impl<Enum, struct combo_tag>;
template <typename Enum>
using flag_mask = flag_set_impl<Enum, struct mask_tag>;
As tag type we use an on the fly struct
declaration, as it really isn’t important.
The only thing the user should now about are the bitwise operations, we overload them like this:
- We can
|
twoflag_combo
objects as well as a combo with an enumerator, result is aflag_combo
- We can
&
twoflag_mask
objects yielding a mask. - We can
~
aflag_combo
or an enumerator yielding a mask. - We can
~
aflag_mask
yielding a combo. - We can also compare two masks/combos for equality as well as a combo with an enumerator.
Implementation is very straightforward with the given interface as are the mask()
and combo()
conversions.
flag_set
flag_set
is the important type for the user,
it shouldn’t worry too much about the other ones.
It uses flag_set_impl
as a member and all functions simply forward to it.
flag_set
provides the straightforward named member functions:
set()
,reset()
,toggle()
as well as set_all()
,reset_all()
and toggle_all()
.
Unlike flag_set_impl
they work in-place as that’s more convenient for the user
and set()
also has a bool value
overload.
It can also be created from a flag combination (i.e. flag_combo
or enumerator)
as well as assigned to:
template <typename FlagCombo, typename = detail::enable_flag_combo<FlagCombo, Enum>>
constexpr flag_set(const FlagCombo& combo) noexcept : flags_(combo)
{
}
detail::enable_flag_combo<FlagCombo, Enum>
is a convenience alias for typename std::enable_if<is_flag_combo<T, Enum>::value>::type
,
and is_flag_combo
is:
template <typename T, typename Enum>
struct is_flag_combo : std::false_type
{
};
template <typename Enum>
struct is_flag_combo<Enum, Enum> : flag_set_traits<Enum>
{
};
template <typename Enum>
struct is_flag_combo<flag_combo<Enum>, Enum> : flag_set_traits<Enum>
{
};
I’ll get back to the traits, otherwise it simply checks whether the argument is either the enum
directly
or a flag_combo<Enum>
.
So simple SFINAE ensures that the conversion only works for a | b
and not ~a
.
flag_set
also provides the compound bitwise operations,
|=
and ^=
are constrained like the constructor, &=
requires a flag_mask
,
catching a potential mistake as I wanted.
A little bit more interesting are the non-compound operators.
We can use identical overloads for operator|
, operator^
and operator&
,
each returning the new flag_set
,
but then we’d miss one:
using operator&
to check whether bits are set.
This operator&
takes a flag combination not a mask
and it also should return bool
.
But this is trivial to add as a flag combination and a flag masks are two distinct types.
Unlike other implementations I thus can get rid of the conversion to bool
flag_set
would need otherwise.
Automatically generating the overloads for the enum
We’ve done everything except one last piece is missing:
There are still no bitwise operations for the enum
directly,
all we could overload are the ones taking at least one user-defined type.
flag_set_impl
also needs to know how many flags are in an enum
,
in order to select the integer type and implement the all_set()
constructor.
We can solve two problems at once by introducing the flag_set_traits
.
This is a class template that can be specialized for your own types,
i.e. enum
s.
It must provide a static constexpr
function size()
that returns the number of flags in the enum
,
used by the flag_set_impl
.
And it can also be used to “generate” the bitwise operations.
We can’t overload them directly, as we don’t know the type of the enum
yet.
So all we can do is write them as templates in a global scope.
Global scope is needed because ADL doesn’t help us here, as we don’t know the namespace of the flag.
But then every type would suddenly have an operator~
,
which could be a better match then the one they actually provide!
This is clearly a bad idea,
so instead we can constrain the templates.
We can use SFINAE to enable them only if the type is an enum
with specialized flag_set_traits
.
Then they only apply where we actually want them.
Detecting a specialization isn’t hard either,
we can simply require that every specialization inherits from std::true_type
and check flag_set_traits<Enum>::value
.
Now this still isn’t a nice solution - it is still a global templated operator, but there aren’t nice solutions. The only other one besides “do it manually” is with a macro.
With that technique, we can add the missing missing operators:
template <typename Enum, typename = type_safe::detail::enable_flag<Enum>>
constexpr type_safe::flag_mask<Enum> operator~(const Enum& e) noexcept
{
return type_safe::flag_mask<Enum>::all_set().reset(e);
}
template <typename Enum, typename = type_safe::detail::enable_flag<Enum>>
constexpr type_safe::flag_combo<Enum> operator|(const Enum& a, const Enum& b) noexcept
{
return type_safe::flag_combo<Enum>(a) | b;
}
We need to create a mask when building the complement of a flag, and a combination when we or two together.
Automatically using a correct flag_set_traits
The approach with the flag_set_traits
works and is non-intrusive.
It is a little bit ugly, however:
When you define your enum
you’ll have to close the namespace,
open the namespace of the flag_set_traits
,
specialize it, and then open the original one again,
if you need to add anything else.
It would be better if the default flag_set_traits
specialization would work on its own.
This can be done as well, on the cost of making it intrusive.
The default flag_set_traits
can check whether the argument is an enum
and whether it has a special enumerator, i.e. _flag_set_size
.
If that’s the case it inherits from std::true_type
and uses _flag_set_size
as the return value for size()
,
else it inherits from std::false_type
.
Conclusion
We’ve now created a way to implement flags simply by writing the following code:
enum class flags
{
a,
b,
c,
…
_flag_set_size
};
There is no need to assign powers of two, no need to use a macro or overload operators. It just works out of the box.
Furthermore it uses the type system to give the bitwise operations semantic information, so that the compiler can check common mistakes when misusing the operators. But unless the user deliberately want to make the “mistake”, it doesn’t need to care, as the use of the types are hidden away.
The full implementation is part of my type_safe library, and can be found here.
This blog post was written for my old blog design and ported over. If there are any issues, please let me know.