Naming Things: Implementer vs. User Names

27 Nov 2019 by Jonathan

I wanted to write this blog post about (a specific part of) naming things back in July, but ironically I didn’t have a name for the symptom I wanted to describe. I only found a good name when I attended Kate Gregory’s talk on naming at CppCon, and now I finally have the time to write my thoughts down.

So I want to write about naming. In particular, about the phenomenon that sometimes a name is a perfect description of what a function does, yet it is totally useless.

Case Study 1: `std::log2p1()`

C++20 adds a couple of bit manipulation functions to the header <bit>. One of them is std::log2p1. It looks like this:

int log2p1(int i)
{
    if (i == 0)
        return 0;
    else
        return 1 + int(std::log2(x)); 
}

It basically returns the binary logarithm plus one, hence the name std::log2 plus 1.

This seems useful…?

It is. std::log2p1(x) is the number of bits necessary to store the value x. This is a very useful function, but just looking at the name doesn’t really make it apparent.

Case Study 2: `std::bless()`

Please interrupt this non-technical blog post about naming for something completely different.

Quick refresher about the C++ object model: when you have a pointer, you are only allowed to do pointer arithmetic if that pointer is part of an array. This makes sense, if you just have an arbitrary pointer, you shouldn’t do arithmetic on it, because there are no neighboring objects.

int obj = 0;
int* ptr = &obj;

++ptr; // UB

However, this makes a lot of existing C++ code undefined behavior. Consider this potential simplified implementation of std::vector<T>::reserve():

void reserve(std::size_t n)
{
    // allocate new memory for our objects
    auto new_memory = (T*) ::operator new(n * sizeof(T));

    // move objects from old buffer to new buffer
    …

    // update buffer
    auto size = this->size();
    begin_ = new_memory;            // UB
    end_   = new_memory + size;     // UB
    end_capacity_ = new_memory + n; // UB
}

We’re allocating memory, moving our objects over, and then updating the pointers to point to the new memory. However, almost every line of this function is undefined behavior: we’re performing pointer arithmetic on memory that is not an array!

The issue here is obviously not with the programmer, because clearly this should be allowed, but with the C++ standard itself. So P0593 proposes to fix the standard by giving certain functions–like ::operator new, std::malloc–the ability to automagically create an array in the returned memory, if required. Then we have a pointer to an array (of e.g. char objects), and can safely do pointer arithmetic.

We’re almost back to talking about naming now.

Sometimes we’re in a situation where we need to do pointer arithmetic, but have memory that did not come from one of those special functions that implicitly create objects for us. For example, when writing the deallocate() function of a memory allocator–we’re given dead memory, no object is living inside it, but yet we need to do pointer arithmetic. For that, P0593 used to propose a function std::bless(void* ptr, std::size_t n) (and another function, also called bless, but I’m not talking about that one here). Calling this function has no actual effect on a physical computer, but it creates the necessary objects to allow pointer arithmetic for the purposes of the abstract machine.

And std::bless was a placeholder name.

There, we’re back to naming now.

So in Cologne, LEWG was tasked to find a new name for this function. Two candidates were implicitly_create_objects() and implicitly_create_objects_as_needed()–because that’s exactly what the function does.

I didn’t like those names.

Case Study 3: `std::partial_sort_copy()`

This example is taken from Kate’s talk.

There is std::sort which sorts a range in-place:

std::vector<int> vec = {3, 1, 5, 4, 2};
std::sort(vec.begin(), vec.end());
// vec == {1, 2, 3, 4, 5}

There is also std::partial_sort which sorts part of a range in-place:

std::vector<int> vec = {3, 1, 5, 4, 2};
std::partial_sort(vec.begin(), vec.begin() + 3, vec.end());
// vec == {1, 2, 3, ?, ?} (don't know whether it is 4,5 or 5,4)

And then there is std::partial_sort_copy which sorts part of range, but not in-place:

const std::vector<int> vec = {3, 1, 5, 4, 2};
std::vector<int> out;
out.resize(3);
std::partial_sort_copy(vec.begin(), vec.end(),
                       out.begin(), out.end());
// out == {1, 2, 3}

Kate argues that std::partial_sort_copy is a less than ideal name, and I agree.

Implementer Names vs. User Names

None of those names discussed above are bad: they are perfectly valid description of what the function does. std::log2p1() computes log2 + 1, implicitly_create_objects() implicitly creates objects, and std::partial_sort_copy() does a partial sort but copying the output.

Yet I dislike all of those names. Why is that?

I dislike those names, because they aren’t useful. Yes, they tell you what the function actually does, but this is not the information you actually want!

You are not sitting there thinking “at this point I need to compute the binary logarithm plus one”, you’re thinking “now I need to know how many bits are required to store this value”. This means you’re reaching for a function called something like bit_width, not log2p1. By the time you made the connect to “binary logarithm plus one”, you’ve already written that yourself (and probably forgotten about special casing zero). And even if you find std::log2p1, the next person (or future you) looking at the code again has to make the connection between binary logarithm and bit width. Something like bit_width() would be a more self-explanatory name.

Similarly, you don’t want to “implicitly create objects” or do a partial sort in a copy, you want to reuse the memory or get the top N values sorted. Something like recycle_storage(), which was another candidate name for std::bless, or top_n_sorted() would be a more intuitive name.

Kate used the term implementer name for describing std::partial_sort_copy(), but it also applies to std::log2p1() and implicitly_create_objects(). They are perfectly natural names when looking at the implementation of a function.

However, they are not the user name: the name a user would use to describe this function. As a user, you’re looking for a function name describing what you want, you don’t care about how the function is implemented. You’d name a function in a way that accomplishes what you’re trying to do–compute the bit_width(), recycle_storage(), or get the top_n_sorted().

Just looking at the specification of a function and naming it based on that can create a disconnect between the implementers point of view and the users point of view. You always need to keep in mind how the function is going to be used.

It sounds like an obvious guideline, but just looking at std::log2p1(), it apparently wasn’t done. And sadly it isn’t always that simple.

Case Study 4: `std::popcount()`

This brings me to std::popcount() which is, just like std::log2p1(), a C++20 addition to <bit>. According to all naming rules, popcount is a terrible name. Unless someone already knows about it, they will be unable to guess what the function does. It not only uses a confusing abbreviation (pop has nothing to with push), the full name–population count–doesn’t really help either.

But it is a perfect description of the function. What does std::popcount() do? It lowers to the popcount instruction.

popcount is an implementers name.

Yet, here the disconnect between implementer and user isn’t as jarring: popcount is the accepted name for a function that counts the number of set bits. If you’re doing bit manipulation and know about the domain, this is the name you’ll reach for.

Of course, this begs the question: should you name your function for beginners or experts?

A Happy End?

P1956 (will be public in a couple of days) proposes a rename of std::log2p1() to std::bit_width(). It is on track to be applied to C++20.

Similarly, std::ceil2 and std::floor2 are renamed to std::bit_ceil() and std::bit_floor(), which is good, as they are also bad names (but for different reasons).

In Cologne, LEWG picked neither implicitly_create_objects[_as_needed] nor recycle_storage for std::bless, but instead decided to remove the function altogether. The same can be accomplished by calling placement-new of a byte array, so the function is not necessary. I dislike that, because it doesn’t make the intent as clear as a call to std::recycle_storage() would (which was my favorite).

The other std::bless() function is still there, but now called start_lifetime_as, which I like. It will likely be part of C++23.

And of course, std::partial_sort_copy can’t be renamed–it’s been part of C++ since ‘98. But still, the worst offender, std::log2p1 will be fixed.

When naming things, keep in mind how it will be used, what users want to accomplish with it. As Kate said: naming requires empathy.

Case Study 1: std::log2p1()

Case Study 2: std::bless()