foonathan::blog()

Thoughts from a C++ library developer.

Prefer nonmember, nonfriends?

How many member functions does std::string have?

As of C++17 the answer is 153, assuming I counted correctly.

One Hundred Fifty Three.

That is a lot. And as Herb Sutter has pointed out, most of those members could easily be implemented as non-members without loss of performance.

And they should be implemented as nonmembers according to an old guideline from the C++ coding standards: Prefer nonmember, nonfriends. Write free functions whenever possible, not members.

But how true is that advice really?


Advertisement

Prefer nonmember, nonfriends

Scott Meyers made excellent points in Items 18 (Strive for class interfaces that are complete and minimal) and 19 (Differentiate among member functions, non-member functions, and friend functions), as well as the Guru of the Week #84 and many others, so I’m not going to repeat them all in great detail here.

The gist is: Big classes are more work to maintain, more difficult to understand, violate the single responsibility principle and leads to tighter coupling. Furthermore, it can lead to duplicated work if an algorithm that could be applied to multiple types is buried within a specific class. See the 30 - thirty! - find functions of std::string where 24 are ported over to std::string_view, with most likely the exact same implementation.

There are six groups (find, rfind, find_first_of, find_last_of, find_first_not_of, find_last_not_of) with five overloads each in std::string and four overloads each in std::string_view, the missing one overload from std::string_view is finding a std::string.

So the general idea is: If a function can* be non-member, make it non-member. The definition of can is determined as follows (according to the C++ coding standards):

  • If the function is one of the operators =, ->, [], or (), which must be members: Make it a member.
  • Else if:

    • a) the function needs a different type as its left-hand argument (as do operators>> or <<, for example);
    • or b) it needs type conversions on its leftmost argument;
    • or c) it can be implemented using the class’s public interface alone:

    Make it a nonmember (and friend if needed in cases a) and b)). If it needs to behave virtually: Add a virtual member function to provide the virtual behavior, and implement the nonmember in terms of that.

  • Else: Make it a member.

You should make it a member, if it has to be a member (special operators like operator=), you should make it nonmember, if it has to be a nonmember function (type conversion on arguments etc.). Otherwise the decision should simply be whether or not the function can be implemented efficiently using the member functions alone. Furthermore, you should prefer member functions over friend functions.

However, there is a problem if you write nonmember functions instead of member functions: This is not an implementation detail but an obvious change for users as the calling syntax is different.

This leads to a variety of problems:

1. Nonmember functions make chaining awkward

Let’s start with syntax sugar issues and work our way up. If you have a nonmember function, chaining is awkward.

Consider my ts::optional implementation. Among others, it provides two member functions value_or() and map(). value_or() returns either the stored value or a fallback value, if the optional is empty, and map() applies a function to the stored optional and returns an optional containing the transformed value, or an empty optional of the changed type if the original one was empty.

Both functions can easily be implemented without performance overhead using the has_value() and value() member functions:

template <typename T, typename U>
T value_or(const ts::optional<T>& optional, U&& fallback)
{
    return optional.has_value() ? optional.value() : std::forward<U>(fallback);
}

template <typename T, typename Func>
auto map(const ts::optional<T>& optional, Func f)
-> ts::optional<decltype(f(optional.value()))>
{
    return optional.has_value() ? ts::make_optional(f(optional.value())) : ts::nullopt;
}

The implementation glosses over a lot of details, but those aren’t important here.

However, those definition of value_or() and especially map() completely defeats their purpose. They allow simple and safe processing of optional values:

ts::optional<id> try_get_id();
T lookup(const id& i);

auto value = try_get_id()
             .map(&lookup) // get an optional<T>
             .map(&calculate_value) // get an optional value
             .value_or(42); // get the value or 42

This post isn’t trying to convince you of the beauty of that code, just accept it and compare it with the nonmember equivalent:

auto value = value_or(map(map(try_get_id(), &lookup), &calculate_value), 42);

This is almost impossible to read.

You are either forced to create a lot of temporaries:

auto id = try_get_id();
auto t = map(id, &lookup);
auto maybe_value = map(t, &calculate_value);
auto value = value_or(maybe_value, 42);

Or don’t use map at all:

auto value = 42;
if (auto id = try_get_id(); id.has_value())
{
    auto t = lookup(id.value());
    value = calculate_value(t);
}

I’m using C++ 17’s if with initializer to make it somewhat nicer. And here it works, but just imagine if lookup and calculate_value returned optionals themselves.

That’s why I had to make them member functions: I wanted easy chaining.

Note that this isn’t ideal either: My variant also has map() with a very similar implementation. If it were nonmember, I could have created a generic facility to provide map() for a certain category of types. However, I had to choose user experience over implementation experience.

2. Nonmember functions expose implementation details

Bear with me.

Consider a simple singly-linked list implementation. In order to minimize memory footprint we don’t store the size of the list in a separate variable. Instead we only store the pointer to the first node.

When we want to implement size() we can make it easily in terms of the provided iterator interface, so we make it a nonmember function:

template <typename T>
std::size_t size(const my_list<T>& list)
{
    return std::distance(list.begin(), list.end());
}

However, if we chose to store the size as member variable, we would have made it a member function:

template <typename T>
std::size_t my_list<T>::size() const
{
    return size_;
}

The implementation of our list directly affected the user interface, in particular, whether or not size() would be a member or nonmember function.

Now you could argue that in this particular case, this would be a good thing. A list that stores the size has different applications than a list that doesn’t. However, this has a problem with generic code:

3. Nonmember functions can lead to problems in generic code

If we have one container where size() is a nonmember function, we can’t use it in all our generic code that assumes a member size() function. And since all STL containers have a member size() function, most code would assume that as well.

But also:

4. Member functions can lead to problems in generic code

Suppose you want to get the size of a collection in a generic context:

template <typename Container>
void foo(const Container& cont)
{
    auto size = cont.size();
    
}

We call the member function as all STL containers have those. However, this leads to a problem in the following code:

int array[] = {};
foo(array);

An array does not have a .size(); it can’t have any member functions! Instead assume there is a nonmember size that would work, so we would need to call that.

The solution to both problems is to introduce a wrapper and call it instead:

template <typename T>
auto do_get_size_impl(int, const T& obj) -> decltype(obj.size())
{
    return obj.size();
}

template <typename T>
std::size_t do_get_size_impl(char, const T& obj)
{
    using my_array_size_namespace::size;
    return size(obj);
}

template <typename T>
std::size_t do_get_size(const T& obj)
{
    return do_get_size_impl(0, obj);
}

We have two do_get_size_impl() overloads, one calls a member function, the other one calls a nonmember function. SFINAE ensures that the correct overload is chosen.

To prevent problems in situations where a type has both a member and a nonmember function, there is an unnamed argument, the member one takes an int, the nonmember one a char. This is priority tag dispatching. It is eventually called with 0 as first argument, so overload resolution wants to call the member overload as it would be an exact match, and will only use the nonmember overload if SFINAE disables the first one.

What this means is that it will call the member function unless there is no member function, which is in general the good thing to do.

This is similar to what the new std::size does. However, this is a lot of boilerplate.

Prefer nonmember nonfriends?

So the algorithm from the beginning, which decides when to make a function member or not, doesn’t work as we need to acknowledge syntax. Instead, a revised algorithm would look something like this:

  • If the function is one of the operators =, ->, [], or (), which must be members: Make it a member.
  • Else if the function needs to be chained: Make it a member.
  • Else if:

    • a) the function needs a different type as its left-hand argument (as do operators>> or <<, for example);
    • or b) it needs type conversions on its leftmost argument;
    • or c) it can be implemented using the class’s public interface alone;
    • or d) convention requires that it is a nonmember function; and there is no convention to make it a member function:

    Make it a nonmember (and friend if needed in cases a) and b) and d)). If it needs to behave virtually: Add a virtual member function to provide the virtual behavior, and implement the nonmember in terms of that.

  • Else: Make it a member, unless there are other reasons.

And also a guideline for generic algorithms:

Make sure that it is the right choice if you call a member function or a nonmember function in a generic context. If both could be valid, introduce a wrapper to allow both, or use a customization point that sucks less.

This isn’t a nice guideline, however.

But there’s a potential solution:

Unified call syntax

The general problem is that member function call syntax is different from nonmember function call syntax, although this really shouldn’t matter at all! There is no benefit to have an awkward difference between member functions and nonmember functions, this does not expose any information.

Member function syntax is nicer if you want to chain things or if there is one special argument. Nonmember function syntax is nicer in all other situations. It would be great if you could simply switch between the two syntax forms.

That’s the idea behind a proposed unified call syntax. It would allow exactly that, but wasn’t accepted so far.

There are various approaches, to paraphrase N4474:

  1. Generalize x.f(y) to call f(x, y), if there is no matching member function.
  2. Generalize f(x, y) to call x.f(y), if there is no matching free function.
  3. Do both 1. and 2.
  4. When writing x.f(y), consider all member functions and free functions and use overload resolution to determine which one should be called. Vice-versa for f(x, y).
  5. When writing x.f(y) or f(x, y) first look for a member function, then a free function.

Each approach has their own advantages and disadvantages, so it is difficult to pick one. As far as I know, the current approach is 3, but I don’t know the exact status.

I really hope this will get into C++ one day. Because right now, the situation is messy.

Conclusion

Prefer nonmember nonfriend is a reasonable guideline, but sadly not universally applicable. As nonmember functions have a very different calling syntax, the most general guideline is probably:

Use a member function if you think that most users want to call it as x.f(y). Use a nonmember function if you think that most users want to call it as f(x, y).

But if we get a unified call syntax, the guideline can be the one from C++ Coding Standards:

Use a nonmember function if you don’t need type conversion in the first argument or don’t need access to private data. Use a member function if you do need access to private data.

And then each user can decide how to call it. This is what’s actually needed.

Appendix: In a perfect world

This is just me ranting about language design choices, with lots of personal opinion. You should probably stop reading.

I think member functions were a mistake.

In addition to the discussed problems they also have a weird definition syntax with trailing const and && and have slightly different rules.

Furthermore, they solve a problem that could be solved with three separate features:

  • Give certain functions access to private data of a class without marking them as friend. In a perfect world - which of course has modules! - this could be as easy as all functions in a module, or something like Rust’s impl block: all functions in there have access to a class private data (AFAIK).

  • Allow polymorphic behavior for free functions. We could mark one - or even many! - arguments with virtual and can the override it for derived types. Or use some other mechanism.

  • Allow automated access to members of one argument. This could be solved by introducing a mechanism where you name any parameter this, name lookup will consider its members. This solves the tedious object prefix.

With those we could have everything member functions offer us, but simpler and cleaner. Universal function call syntax would then allow the caller - not the implementer - how a function call should look like, depending on the situation.

Sadly, this will probably not be possible in C++, so the best thing to hope for is unified function call syntax.


Advertisement