foonathan::blog()

Thoughts from a C++ library developer.

std::string_view accepting temporaries: good idea or horrible pitfall?

C++17 brings us std::string_view. It is a really useful tool: If you want to write a function accepting some string, but does not need ownership, i.e. a view, use std::string_view. It supports both const char* and std::string without any work, and does not involve any heap allocations. Further, it clearly signals intent: this function takes a view. It doesn’t own anything, it just views it.

As someone who frequently advocates for using correct types, I am happy about std::string_view. Yet there is one design decisions that warrants a discussion: std::string_view silently views temporaries as well. This can create a problem if the view lives longer than the temporary, as the view now views already destroyed data.

Let’s look into the reasons behind this decision and what that means for using std::string_view.


Advertisement

The problem of accepting temporaries

Consider you’re writing a class that stores some std::string, with a getter function to get that string:

class foo
{
    std::string my_str_;

public:
    const std::string& get_str() const
    {
        return my_str_;
    }

    
};

The getter returns the string by const reference. Now this exposes the fact that you’re using std::string internally and a client might start to depend on that. If you’re later deciding to switch to a different string type, even std::string with a different kind of allocator, you’ll have to change the return type, which is an API change.

However, you can use std::string_view here to solve that problem:

std::string_view get_str() const
{
    return my_str_;
}

Now you can internally use any string implementation as long as it stores chars in a continuous buffer, and the user doesn’t need to care. That’s the beauty of correct abstractions and std::string_view.

However, requirements on foo change and one day shortly before release you need to store additional information in that string. For a proper refactor is now time, you’ll go ahead and append the additional information - maybe some kind of prefix character? - to the string. And late at night you quickly change the getter so that it doesn’t return the whole string, but a substring:

std::string_view get_str() const
{
    // substr starting at index 1 till the end
    return my_str_.substr(1u);
}

Do you think that code works?

More importantly: Do you think it should work? The second answer is “definitely”: you are simply creating a view on some part of the string, what’s the problem?

The problem is that std::string::substr() - which is being called here, returns std::string; a temporary std::string. So we’re creating a view to a temporary object which will blow up as soon as we try to use it.

The correct solution requires an explicit conversion to std::string_view first:

std::string_view get_str() const
{
    return std::string_view(my_str_).substr(1u);
}

The view version of substr() correctly returns a view here and we don’t have a problem. But this is a very subtle change and not intuitive.

Now the main problem here is the return type of std::string::substr(), it should be changed to std::string_view. And this is also just one aspect of the general problem of dangling references, which aren’t solved in C++.

But in this instance it could have been very easy to prevent it. If std::string_view would only accept lvalues, and not temporaries, the problematic code wouldn’t compile. While this still would allow dangling references, it prevents stupid mistakes like these. And even if you prevent only one error, that’s still better than preventing no errors.

For the sake of completeness: There is also a clang-tidy check that should work here. But clang-tidy isn’t your compiler.

So why does std::string_view allow temporaries?

The people on the standards committee aren’t stupid, they knew that std::string_view would allow temporaries. And they’d also knew how to prevent std::string_view from accepting temporaries.

So what’s the reason behind there decision?

The answer is the biggest use case of std::string_view:

The benefit of accepting temporaries

std::string_view is perfect for non-owning string parameters:

void do_sth(std::string_view str);

Any function taking const char* or const std::string& should be updated to use std::string_view.

As long as it will not take ownership. Consider then a by-value parameter and moving or overloading on const& and &&.

And if you use std::string_view as a function parameter, you will never run into a temporary issue:

do_sth(std::string("hi").substr(1u));

Here we still pass a temporary that will be destroyed at the end of the full expression, but when that’s happening, the function call is already over! As long as the function doesn’t copy the view somewhere, there is no problem.

Furthermore, accepting temporaries is not only working, but it is also desired:

std::string get_a_temporary_string();

do_sth(get_a_temporary_string());

If std::string_view wasn’t accepting temporaries, you’d have to use:

auto tmp = get_a_temporary_string();
do_sth(tmp);

And that might be too verbose.

So who’d you use std::string_view then?


Guideline

It is completely safe to use std::string_view in function parameters if the function needs a non-owning view of a string and doesn’t need to store that view somewhere else.

Be careful when using std::string_view in return values. Ensure that the function doesn’t return a temporary. Be careful when calling std::string::substr().

Be very careful when storing a std::string_view somewhere, i.e. in a class object. Ensure that the viewed string outlives the view.

Consider avoiding std::string_view as local variable type, use auto&& instead.


I haven’t talked about the last point: It might be desired to create a view locally in some function. There you can also run into the dangling reference issue. If you use a real reference instead, however, lifetime extension ensures that the temporaries live long enough. This is something std::string_view can’t offer you.

Now while this guideline seems reasonable, I’m not happy with it. There are too many “be careful” in that guideline. C++’s already complicated enough, let’s not add more complexity.

And there is a better solution: Use my old friend the type system.

function_view vs function_ref

A while back Vittorio Romeo published a post about a function_view implementation. function_view is the std::string_view equivalent of std::function. And like std::string_view it accepted temporaries as it was designed as an replacement of the template <typename Functor> void do_sth(data_t data, Functor callback) idiom.

I know. Deal with it.

Instead of passing the callback via template parameter, function_view can be used instead. It allows all functions with a given signature.

Now around the time he wrote his implementation, I’ve worked on object_ref of my type_safe library. object_ref is basically a non-null pointer. Now as object_ref is meant to store a lasting reference, i.e. as a member of in a class, it should not accept rvalues. After all you can’t point to a temporary either.

So when I read Vittorio’s post and decided “it shouldn’t accept temporaries”. So I’ve wrote a function_view implementation that doesn’t accept temporaries. I called it function_ref to be consistent with the object_ref I already had. I blogged about it, as a function_view that doesn’t accept temporaries is harder than you might think.

After the post there was a discussion on reddit. They - correctly - pointed out that not accepting temporaries made it awkward to use as a function parameter.

And then it hit me: function_view and function_ref are two orthogonal things! function_view is designed for function parameters, function_ref is designed for everything else. function_view should accept temporaries as this is useful and safe for function parameters, function_ref must not.

View and ref types

As a non-owning reference as parameter requires different semantics than one a non-owning reference used anywhere else, it makes sense to create two separate types for that.

One type - the view - is designed for parameters. It should accept temporaries. Regular const T& also qualifies as a view type.

The other one - the ref - is designed for the other use cases. It should not accept temporaries. Furthermore the constructor should be made explicit, to highlight the fact that you are creating a long living reference:

view_string(str);
refer_to_string(string_ref(str));
transfer_string(std::move(str));

Now it is clear at the call site what each function does and where you need to be careful about lifetime.

A pointer can be seem as a ref type, as it does not bind to temporaries and it has an explicit syntax when you create it (&str). However, it is an optional ref type, as it can be null. A non-const lvalue reference almost qualifies as ref type, the only thing missing is the explicit syntax to create it.

I named them XXX_view and XXX_ref, but the actual names aren’t important. Important is that I can suggest a refined guideline:


Guideline

If you need a non-owning reference to something, use either a view or a ref type.

Use a view type only as function parameter, where the view isn’t stored somewhere else. View types should only live a short live.

Use a ref type for everything else, like return values or storing it in an object. Also use a ref type as function parameter where the ref will be stored somewhere else, and the caller has to ensure the lifetime works.

When using ref types you have to be careful about the lifetime, just like if you were using a pointer.


Advertisement

Conclusion

The standard library doesn’t provide std::string_ref with the intended semantics, and it is probably too late to add it now. So you’ll have to follow my first guideline there and just be careful about temporaries, as the compiler can’t remind you.

But you can view or ref a lot of other things like arrays, functions, etc. So when designing you’re own view types, consider also providing the corresponding ref type. They can easily share an implementation as the only difference is in the constructor.

But for many types you don’t need special view types. const T& is perfect if you need to view just a single type. And you can either use ts::object_ref, gsl::non_null or simply T* as a ref type for a regular object.

The final guideline only covers one case of function parameters: Parameters which are simply passed to a function. The two other cases are input and output parameters. For input parameters use pass by value or overload on const T& and T&&. But what to do for output parameters? This blog post got you covered as well.