Tutorial: C++20’s Iterator Sentinels

You probably know that C++20 adds ranges. Finally we can write copy(container, dest) instead of copy(container.begin(), container.end(), dest)!

Ranges also do a lot more. Among other things, they add a new way of specifying an iterator to the end – sentinels.

Motivation

Suppose you want to copy a null-terminated string into some buffer (excluding the final null terminator). No problem, you quickly write the loop:

void do_sth(const char* str)
{
    std::vector<char> buffer;
    while (*str)
    {
        buffer.push_back(*str);
        ++str;
    }

    // continue doing something
}

We keep incrementing the str pointer, and inserting the character until we’ve reached the null character. Straightforward stuff.

However, that’s a raw loop, which is considered bad style in certain situations. Instead, we should use an STL algorithm, in this case std::copy. With std::copy the code looks like this:

void do_sth(const char* str)
{
    std::vector<char> buffer;
    std::copy(str, str + std::strlen(str),
              std::back_inserter(buffer));

    // continue doing something
}

We pass std::copy the iterator range and use std::back_inserter as the output iterator. It will repeatedly call .push_back(), just like the code above. But note the way we specify the range: the begin iterator is str and the end iterator is str + std::strlen(str), that is a pointer to the null terminator. By saying str + std::strlen(str) for the end, std::strlen() needs to iterate over the string and find the end – we end up with two loops instead of one! The first loop for finding the end, and then a second loop to copy all characters. In the first version, we combined both loops into one, by checking for the end while copying.

Can we achieve the same using the algorithms?

Note that if we have to call std::strlen() anyway, we might as well reserve the appropriate amount of memory up front instead of dynamically resizing, which might make the two loop version better. But let’s ignore that here.

The problem

An iterator in C++ is a generalized pointer. We can dereference it to get the current value, increment it to move to the next value, and compare it with other iterators. As a consequence, a range is specified with two iterators: one to the beginning and one past the end. When we iterate over a range, we repeatedly increment the first one, until it is equal to the one past the end:

for (auto iter = begin; iter != end; ++iter)
{
    auto value = *iter;
    
}

This works fine for containers whose sizes are known, but not for null-terminated strings. For a null-terminated string, we don’t know the end up front, we can only detect it while iterating. This makes it incompatible with the C++ iterators.

In other languages, iterators are defined differently. A range is not defined by an iterator pair, but by a single object: We can get the current value and advance it, but we can also ask the iterator itself whether it is done. There range iteration can look something like this:

for (auto iter = begin; !iter.is_done(); iter.advance())
{
    auto value = iter.get();
    
}

With such an iterator concept, it is trivial to iterate over a null-terminated string:

class zstring_iterator
{
public:
    bool is_done() const
    {
        return *cur_ == '\0';
    }

    char get() const
    {
        return *cur_;
    }

    void advance()
    {
        ++cur_;
    }

private:
    const char* cur_;
};

Because we ask the iterator whether it is done instead of comparing it to some other iterator position, we can just check for the null character as we did with the while loop version above. We want to allow the same with the C++ iterators.

The solution

When we spell “is this iterator at the end?” as iter.is_done(), we can easily put in a check for the null character. However, we spell it iter == end. We need to somehow turn iter == end into something equivalent to *iter != '\0'. Luckily there is a way of doing that: operator overloading.

Instead of having end as just some other iterator (a const char* in our case), we give the end iterator a distinct type. This new “end-only” iterator cannot be dereferenced. All we can do is compare it to a “normal” iterator. This equality check has the semantic meaning of asking the iterator whether it is at the end.

In the C++20 standard library, such an end-only iterator is called a sentinel. It looks something like this:

class iterator
{
    // Some iterator, with *, ++, etc.
};

// We still want to be able to compare two iterators.
bool operator==(iterator lhs, iterator rhs);
bool operator!=(iterator lhs, iterator rhs);

// The special end-only iterator.
// It is usually an empty type, we don't actually need any objects.
// It's just there because `==` takes two parameters.
class sentinel {};

bool operator==(iterator iter, sentinel)
{
    return /* is iter done? */;
}
bool operator!=(iterator iter, sentinel)
{
    return /* is iter not done? */;
}

bool operator==(sentinel, iterator iter);
bool operator!=(sentinel, iterator iter);

Note that in C++20, we don’t need any operator!= overloads or the operator== overloads with swapped arguments. When the compiler sees sentinel{} != iter, it will be rewritten to !(iter == sentinel{}). Watch my talk for the full details.

A sentinel for a null terminated string is now simple to implement. Note that the iterator type is still the plain old const char*, there is no need to change that.

// Empty type.
struct zstring_sentinel {};

// Are we done?
bool operator==(const char* str, zstring_sentinel)
{
    return *str == '\0';
}

// != and reversed operators not needed in C++20.

That’s it, that’s all that is required. Now we can write our copy code like this:

void do_sth(const char* str)
{
    std::vector<char> buffer;
    std::copy(str, zstring_sentinel{}, std::back_inserter(buffer));

    // continue doing something
}

Instead of passing str + std::strlen(str), we give it the sentinel type. Internally, the algorithm will have a loop that increments str until it is equal to the end iterator. In our case, the end iterator is the sentinel, so we invoke the operator== that checks whether we’ve reached the null terminator. No two loops required.

Except… it doesn’t compile.

You see, while we haven’t actually changed anything about the iterator concept, we’ve changed the way we specified a range. Previously, we passed two iterators that had the same type, now we don’t. And the signature of std::copy() requires that the first two arguments have the same type.

Deploying the new iterator and sentinel ranges requires some small cooperation in the signature.

The new C++20 rangified algorithms have done that, so instead of calling std::copy() we have to call std::ranges::copy():

void do_sth(const char* str)
{
    std::vector<char> buffer;
    std::ranges::copy(str, zstring_sentinel{},
                      std::back_inserter(buffer));

    // continue doing something
}

Note that std::ranges::copy also gives us the final value of str, that is str + std::strlen(str). As such, it calculates the “old-style” end iterator for us as well!

This is why std::copy doesn’t support it, std::ranges::copy has a different return type, a struct containing the final str and output iterator.

Note that the language version, the range-based for loop, has received the appropriate update already in C++17, so with a little helper we can use a range-based for loop to iterate over a const char*:

struct zstring_range
{
    const char* str;

    auto begin() const
    {
        // The begin is just the pointer to our string.
        return str;
    }
    auto end() const
    {
        // The end is a different type, the sentinel.
        return zstring_sentinel{};
    }
};

void do_sth(const char* str)
{
    std::vector<char> buffer;
    for (auto c : zstring_range(str))
        buffer.push_back(c);

    // continue doing something
}

Note that in C++20 we don’t need a constructor. We can initialize an aggregate like zstring_range with parentheses as if it had a constructor. This makes () the new uniform initialization.

Conclusion

Whenever you have a range where the end is some dynamic condition instead of a fixed position, use an iterator and sentinel pair instead.

// Empty tag type.
struct sentinel {};

// Check whether the associated iterator is done.
bool operator==(iterator iter, sentinel);

In order to support that, all that is required of existing algorithms is to change their signatures from

template <typename I>
void algorithm(I begin, I end);

to

template <typename I, typename S>
void algorithm(I begin, S end);

As no other changes is required, you should start doing that now, even if there are no existing sentinels. It prepares your code for future range types.

Note that sentinels are not a general replacement for end iterators. For containers like std::vector, the end is just some known position, there is no need to introduce a sentinel. This still allows decrementing the end iterator to go backwards, something inherently impossible with sentinels.