std::string_view accepting temporaries: good idea or horrible pitfall?
C++17 brings us std::string_view
.
It is a really useful tool:
If you want to write a function accepting some string, but does not need ownership, i.e. a view,
use std::string_view
.
It supports both const char*
and std::string
without any work,
and does not involve any heap allocations.
Further, it clearly signals intent: this function takes a view.
It doesn’t own anything, it just views it.
As someone who frequently advocates for using correct types,
I am happy about std::string_view
.
Yet there is one design decisions that warrants a discussion:
std::string_view
silently views temporaries as well.
This can create a problem if the view lives longer than the temporary,
as the view now views already destroyed data.
Let’s look into the reasons behind this decision
and what that means for using std::string_view
.
The problem of accepting temporaries
Consider you’re writing a class that stores some std::string
,
with a getter function to get that string:
class foo
{
std::string my_str_;
public:
const std::string& get_str() const
{
return my_str_;
}
…
};
The getter returns the string by const
reference.
Now this exposes the fact that you’re using std::string
internally
and a client might start to depend on that.
If you’re later deciding to switch to a different string type,
even std::string
with a different kind of allocator,
you’ll have to change the return type, which is an API change.
However, you can use std::string_view
here to solve that problem:
std::string_view get_str() const
{
return my_str_;
}
Now you can internally use any string implementation as long as it stores char
s in a continuous buffer,
and the user doesn’t need to care.
That’s the beauty of correct abstractions and std::string_view
.
However, requirements on foo
change and one day shortly before release you need to store additional information in that string.
For a proper refactor is now time, you’ll go ahead and append the additional information - maybe some kind of prefix character? - to the string.
And late at night you quickly change the getter so that it doesn’t return the whole string, but a substring:
std::string_view get_str() const
{
// substr starting at index 1 till the end
return my_str_.substr(1u);
}
Do you think that code works?
More importantly: Do you think it should work? The second answer is “definitely”: you are simply creating a view on some part of the string, what’s the problem?
The problem is that std::string::substr()
- which is being called here,
returns std::string
; a temporary std::string
.
So we’re creating a view to a temporary object which will blow up as soon as we try to use it.
The correct solution requires an explicit conversion to std::string_view
first:
std::string_view get_str() const
{
return std::string_view(my_str_).substr(1u);
}
The view version of substr()
correctly returns a view here and we don’t have a problem.
But this is a very subtle change and not intuitive.
Now the main problem here is the return type of std::string::substr()
, it should be changed to std::string_view
.
And this is also just one aspect of the general problem of dangling references,
which aren’t solved in C++.
But in this instance it could have been very easy to prevent it.
If std::string_view
would only accept lvalues, and not temporaries,
the problematic code wouldn’t compile.
While this still would allow dangling references, it prevents stupid mistakes like these.
And even if you prevent only one error, that’s still better than preventing no errors.
For the sake of completeness: There is also a clang-tidy check that should work here. But clang-tidy isn’t your compiler.
So why does std::string_view
allow temporaries?
The people on the standards committee aren’t stupid,
they knew that std::string_view
would allow temporaries.
And they’d also knew how to prevent std::string_view
from accepting temporaries.
So what’s the reason behind there decision?
The answer is the biggest use case of std::string_view
:
The benefit of accepting temporaries
std::string_view
is perfect for non-owning string parameters:
void do_sth(std::string_view str);
Any function taking const char*
or const std::string&
should be updated to use std::string_view
.
As long as it will not take ownership. Consider then a by-value parameter and moving or overloading on
const&
and&&
.
And if you use std::string_view
as a function parameter,
you will never run into a temporary issue:
do_sth(std::string("hi").substr(1u));
Here we still pass a temporary that will be destroyed at the end of the full expression, but when that’s happening, the function call is already over! As long as the function doesn’t copy the view somewhere, there is no problem.
Furthermore, accepting temporaries is not only working, but it is also desired:
std::string get_a_temporary_string();
…
do_sth(get_a_temporary_string());
If std::string_view
wasn’t accepting temporaries, you’d have to use:
auto tmp = get_a_temporary_string();
do_sth(tmp);
And that might be too verbose.
So who’d you use std::string_view
then?
Guideline
It is completely safe to use std::string_view
in function parameters if the function needs a non-owning view of a string
and doesn’t need to store that view somewhere else.
Be careful when using std::string_view
in return values.
Ensure that the function doesn’t return a temporary.
Be careful when calling std::string::substr()
.
Be very careful when storing a std::string_view
somewhere, i.e. in a class object.
Ensure that the viewed string outlives the view.
Consider avoiding std::string_view
as local variable type,
use auto&&
instead.
I haven’t talked about the last point:
It might be desired to create a view locally in some function.
There you can also run into the dangling reference issue.
If you use a real reference instead, however, lifetime extension ensures that the temporaries live long enough.
This is something std::string_view
can’t offer you.
Now while this guideline seems reasonable, I’m not happy with it. There are too many “be careful” in that guideline. C++’s already complicated enough, let’s not add more complexity.
And there is a better solution: Use my old friend the type system.
function_view
vs function_ref
A while back Vittorio Romeo published a post about a function_view
implementation.
function_view
is the std::string_view
equivalent of std::function
.
And like std::string_view
it accepted temporaries as it was designed as an replacement of the template <typename Functor> void do_sth(data_t data, Functor callback)
idiom.
Instead of passing the callback via template parameter, function_view
can be used instead.
It allows all functions with a given signature.
Now around the time he wrote his implementation, I’ve worked on object_ref
of my type_safe library.
object_ref
is basically a non-null pointer.
Now as object_ref
is meant to store a lasting reference, i.e. as a member of in a class,
it should not accept rvalues.
After all you can’t point to a temporary either.
So when I read Vittorio’s post and decided “it shouldn’t accept temporaries”.
So I’ve wrote a function_view
implementation that doesn’t accept temporaries.
I called it function_ref
to be consistent with the object_ref
I already had.
I blogged about it, as a function_view
that doesn’t accept temporaries is harder than you might think.
After the post there was a discussion on reddit. They - correctly - pointed out that not accepting temporaries made it awkward to use as a function parameter.
And then it hit me: function_view
and function_ref
are two orthogonal things!
function_view
is designed for function parameters, function_ref
is designed for everything else.
function_view
should accept temporaries as this is useful and safe for function parameters,
function_ref
must not.
View and ref types
As a non-owning reference as parameter requires different semantics than one a non-owning reference used anywhere else, it makes sense to create two separate types for that.
One type - the view - is designed for parameters.
It should accept temporaries.
Regular const T&
also qualifies as a view type.
The other one - the ref - is designed for the other use cases.
It should not accept temporaries.
Furthermore the constructor should be made explicit
,
to highlight the fact that you are creating a long living reference:
view_string(str);
refer_to_string(string_ref(str));
transfer_string(std::move(str));
Now it is clear at the call site what each function does and where you need to be careful about lifetime.
A pointer can be seem as a ref type, as it does not bind to temporaries and it has an explicit syntax when you create it (&str
).
However, it is an optional ref type, as it can be null.
A non-const lvalue reference almost qualifies as ref type, the only thing missing is the explicit syntax to create it.
I named them XXX_view
and XXX_ref
, but the actual names aren’t important.
Important is that I can suggest a refined guideline:
Guideline
If you need a non-owning reference to something, use either a view or a ref type.
Use a view type only as function parameter, where the view isn’t stored somewhere else. View types should only live a short live.
Use a ref type for everything else, like return values or storing it in an object. Also use a ref type as function parameter where the ref will be stored somewhere else, and the caller has to ensure the lifetime works.
When using ref types you have to be careful about the lifetime, just like if you were using a pointer.
Conclusion
The standard library doesn’t provide std::string_ref
with the intended semantics,
and it is probably too late to add it now.
So you’ll have to follow my first guideline there and just be careful about temporaries,
as the compiler can’t remind you.
But you can view or ref a lot of other things like arrays, functions, etc. So when designing you’re own view types, consider also providing the corresponding ref type. They can easily share an implementation as the only difference is in the constructor.
But for many types you don’t need special view types.
const T&
is perfect if you need to view just a single type.
And you can either use ts::object_ref
, gsl::non_null
or simply T*
as a ref type for a regular object.
The final guideline only covers one case of function parameters:
Parameters which are simply passed to a function.
The two other cases are input and output parameters.
For input parameters use pass by value or overload on const T&
and T&&
.
But what to do for output parameters?
This blog post got you covered as well.
This blog post was written for my old blog design and ported over. If there are any issues, please let me know.