Tutorial: C++20’s Iterator Sentinels
You probably know that C++20 adds ranges.
Finally we can write copy(container, dest)
instead of copy(container.begin(), container.end(), dest)
!
Ranges also do a lot more. Among other things, they add a new way of specifying an iterator to the end – sentinels.
Motivation
Suppose you want to copy a null-terminated string into some buffer (excluding the final null terminator). No problem, you quickly write the loop:
void do_sth(const char* str)
{
std::vector<char> buffer;
while (*str)
{
buffer.push_back(*str);
++str;
}
// continue doing something
}
We keep incrementing the str
pointer, and inserting the character until we’ve reached the null character.
Straightforward stuff.
However, that’s a raw loop, which is considered bad style in certain situations.
Instead, we should use an STL algorithm, in this case std::copy
.
With std::copy
the code looks like this:
void do_sth(const char* str)
{
std::vector<char> buffer;
std::copy(str, str + std::strlen(str),
std::back_inserter(buffer));
// continue doing something
}
We pass std::copy
the iterator range and use std::back_inserter
as the output iterator.
It will repeatedly call .push_back()
, just like the code above.
But note the way we specify the range: the begin iterator is str
and the end iterator is str + std::strlen(str)
, that is a pointer to the null terminator.
By saying str + std::strlen(str)
for the end, std::strlen()
needs to iterate over the string and find the end – we end up with two loops instead of one!
The first loop for finding the end, and then a second loop to copy all characters.
In the first version, we combined both loops into one, by checking for the end while copying.
Can we achieve the same using the algorithms?
Note that if we have to call
std::strlen()
anyway, we might as well reserve the appropriate amount of memory up front instead of dynamically resizing, which might make the two loop version better. But let’s ignore that here.
The problem
An iterator in C++ is a generalized pointer. We can dereference it to get the current value, increment it to move to the next value, and compare it with other iterators. As a consequence, a range is specified with two iterators: one to the beginning and one past the end. When we iterate over a range, we repeatedly increment the first one, until it is equal to the one past the end:
for (auto iter = begin; iter != end; ++iter)
{
auto value = *iter;
…
}
This works fine for containers whose sizes are known, but not for null-terminated strings. For a null-terminated string, we don’t know the end up front, we can only detect it while iterating. This makes it incompatible with the C++ iterators.
In other languages, iterators are defined differently. A range is not defined by an iterator pair, but by a single object: We can get the current value and advance it, but we can also ask the iterator itself whether it is done. There range iteration can look something like this:
for (auto iter = begin; !iter.is_done(); iter.advance())
{
auto value = iter.get();
…
}
With such an iterator concept, it is trivial to iterate over a null-terminated string:
class zstring_iterator
{
public:
bool is_done() const
{
return *cur_ == '\0';
}
char get() const
{
return *cur_;
}
void advance()
{
++cur_;
}
private:
const char* cur_;
};
Because we ask the iterator whether it is done instead of comparing it to some other iterator position, we can just check for the null character as we did with the while
loop version above.
We want to allow the same with the C++ iterators.
The solution
When we spell “is this iterator at the end?” as iter.is_done()
, we can easily put in a check for the null character.
However, we spell it iter == end
.
We need to somehow turn iter == end
into something equivalent to *iter != '\0'
.
Luckily there is a way of doing that: operator overloading.
Instead of having end
as just some other iterator (a const char*
in our case), we give the end iterator a distinct type.
This new “end-only” iterator cannot be dereferenced.
All we can do is compare it to a “normal” iterator.
This equality check has the semantic meaning of asking the iterator whether it is at the end.
In the C++20 standard library, such an end-only iterator is called a sentinel. It looks something like this:
class iterator
{
// Some iterator, with *, ++, etc.
};
// We still want to be able to compare two iterators.
bool operator==(iterator lhs, iterator rhs);
bool operator!=(iterator lhs, iterator rhs);
// The special end-only iterator.
// It is usually an empty type, we don't actually need any objects.
// It's just there because `==` takes two parameters.
class sentinel {};
bool operator==(iterator iter, sentinel)
{
return /* is iter done? */;
}
bool operator!=(iterator iter, sentinel)
{
return /* is iter not done? */;
}
bool operator==(sentinel, iterator iter);
bool operator!=(sentinel, iterator iter);
Note that in C++20, we don’t need any
operator!=
overloads or theoperator==
overloads with swapped arguments. When the compiler seessentinel{} != iter
, it will be rewritten to!(iter == sentinel{})
. Watch my talk for the full details.
A sentinel for a null terminated string is now simple to implement.
Note that the iterator type is still the plain old const char*
, there is no need to change that.
// Empty type.
struct zstring_sentinel {};
// Are we done?
bool operator==(const char* str, zstring_sentinel)
{
return *str == '\0';
}
// != and reversed operators not needed in C++20.
That’s it, that’s all that is required. Now we can write our copy code like this:
void do_sth(const char* str)
{
std::vector<char> buffer;
std::copy(str, zstring_sentinel{}, std::back_inserter(buffer));
// continue doing something
}
Instead of passing str + std::strlen(str)
, we give it the sentinel type.
Internally, the algorithm will have a loop that increments str
until it is equal to the end iterator.
In our case, the end iterator is the sentinel, so we invoke the operator==
that checks whether we’ve reached the null terminator.
No two loops required.
Except… it doesn’t compile.
You see, while we haven’t actually changed anything about the iterator concept, we’ve changed the way we specified a range.
Previously, we passed two iterators that had the same type, now we don’t.
And the signature of std::copy()
requires that the first two arguments have the same type.
Deploying the new iterator and sentinel ranges requires some small cooperation in the signature.
The new C++20 rangified algorithms have done that, so instead of calling std::copy()
we have to call std::ranges::copy()
:
void do_sth(const char* str)
{
std::vector<char> buffer;
std::ranges::copy(str, zstring_sentinel{},
std::back_inserter(buffer));
// continue doing something
}
Note that
std::ranges::copy
also gives us the final value ofstr
, that isstr + std::strlen(str)
. As such, it calculates the “old-style” end iterator for us as well!This is why
std::copy
doesn’t support it,std::ranges::copy
has a different return type, a struct containing the finalstr
and output iterator.
Note that the language version, the range-based for
loop, has received the appropriate update already in C++17, so with a little helper we can use a range-based for
loop to iterate over a const char*
:
struct zstring_range
{
const char* str;
auto begin() const
{
// The begin is just the pointer to our string.
return str;
}
auto end() const
{
// The end is a different type, the sentinel.
return zstring_sentinel{};
}
};
void do_sth(const char* str)
{
std::vector<char> buffer;
for (auto c : zstring_range(str))
buffer.push_back(c);
// continue doing something
}
Note that in C++20 we don’t need a constructor. We can initialize an aggregate like
zstring_range
with parentheses as if it had a constructor. This makes()
the new uniform initialization.
Conclusion
Whenever you have a range where the end is some dynamic condition instead of a fixed position, use an iterator and sentinel pair instead.
// Empty tag type.
struct sentinel {};
// Check whether the associated iterator is done.
bool operator==(iterator iter, sentinel);
In order to support that, all that is required of existing algorithms is to change their signatures from
template <typename I>
void algorithm(I begin, I end);
to
template <typename I, typename S>
void algorithm(I begin, S end);
As no other changes is required, you should start doing that now, even if there are no existing sentinels. It prepares your code for future range types.
Note that sentinels are not a general replacement for end iterators.
For containers like std::vector
, the end is just some known position, there is no need to introduce a sentinel.
This still allows decrementing the end iterator to go backwards, something inherently impossible with sentinels.