Guidelines for constructor and cast design

26 Jan 2018 by Jonathan

A while back — but sadly not too many blog posts ago — I wrote about explicit constructors and how to handle assignment. In this blog post, I made the assumption that you most likely want to have explicit single argument constructors.

But when do we actually want implicit single argument constructors?

Let’s consider the broader question: How should I design a cast operation for my user-defined type? And how should I design a constructor?

But first, something different: what is the difference between a cast and a constructor?

Casts vs constructors

It might seem silly to ask for the difference between a cast and a constructor.

I mean, this is a cast:

auto i = static_cast<int>(4.0);

And this invokes a constructor:

auto my_vector = std::vector<int, my_allocator<int>>(my_alloc);

However, the same cast can look like a constructor invocation:

auto i = int(4.0);

And the constructor can look like a cast:

auto my_vector = static_cast<std::vector<int, my_allocator<int>>>(my_alloc);

So what is the difference?

It is a semantic difference, not a syntactic difference.

A constructor is any operation that takes any number of arguments and creates a new object of a given type using those arguments. The value of the new object is created using the values of the arguments, but there is no direct connection between the argument values and the new value. Constructors in C++ are usually implemented using, well, constructors — the C++ language feature. But they don’t have to, as we’ll see.

A cast operation also follows that definition of a constructor. But it is special in two ways: First, it only and always takes a single argument of a different type than the one returned. Second, it fundamentally doesn’t change the value of the argument, just the type.

Let me elaborate on the last one a bit. For the sake of this discussion, a value is the abstract concept like the number four. The static_cast<int>(4.0) takes that value stored as a double and returns an int object still containing the same value — the number four. The value didn’t change, only the representation of that value changed.

Of course, this is not always possible. If we write static_cast<int>(4.1), the value “number 4.1” cannot be stored in an int. This is an example of a narrowing cast. How the cast operation behaves in this situation — throw an exception, round to the “nearest value” whatever that is — is up to the implementation. In contrast, a wide cast would be something like static_cast<long>(4): All possible values of an int can be represented as a long, so it will always succeed.

Casts in C++ are usually implemented with a conversion operator or a free function. But note that they can also be implemented using a C++ constructor — this lead to the confusion earlier.

Using those definitions, the following operations are all casts. While they do create a new object the stored value itself is fundamentally the same.

// the double to int example from above
auto i = static_cast<int>(4.0);

// convert the value "Hello World!" from a character array to a `std::string`
std::string str = "Hello World!";

// convert some pointer value to a unique pointer of the same value
// value didn't change, only ownership is new
std::unique_ptr<int> unique_ptr(some_ptr);

// convert the integer value from above to an optional
// again: no change in value, just represented in a new type that can fit an additional value
std::optional<int> my_opt(i);

Note that a cast operation doesn’t need to contain the word “cast” anywhere!

But here we are using a constructor:

// the vector value from above
auto my_vector = std::vector<int, my_allocator<int>>(my_alloc);

// create a string using an integer and a character
std::string my_string(10, 'a');

// create a string stream using the string from above
std::stringstream stream(my_string);

So with the technicality out of the way, let’s take a closer look at the way casts are handled in C++.

Implicit conversions

A single argument constructor that isn’t marked explicit or a non-explicit conversion operator can be used in an implicit conversion. Basically, the compiler will adjust the types without you needing to do anything. Sometimes you don’t even realize it!

Implicit conversions don’t require any extra typing, so they will happen accidentally at some point. So only add new implicit conversions when they have the following properties:

They are wide conversions: Preconditions require thinking by the programmer, but implicit conversions don’t.
They are reasonably cheap: They will be used a lot, so it is best if they’re cheap.
The benefits of saved typing are significant: When in doubt, don’t add a new implicit conversion.

A good example of an implicit conversion is T → std::optional<T>. It is relatively cheap, there are no preconditions and it should be possible to change a function taking a T at some point to a function taking an optional T.

A negative example would be unsigned → int — it leads to a lot of problems! — or even const char* → std::string — it requires a non-null pointer and is expensive due to a dynamic memory allocation. But the first was inherited from C and the second is just too convenient.

Directly following from that guideline is this one:

Make single-argument constructors explicit by default!

clang-tidy rule google-explicit-constructor really helps.

Although they should have used an attribute [[implicit]] instead of requiring a comment /* implicit */.

C++ casts

In C there was only a single syntax to convert an object of one type into another type: (new_type)old_object. C++ as a bigger and better language added four new ones:

static_cast<new_type>(old_object) for an - eh - “static” (?) conversion, whatever that is
const_cast<new_type>(old_object) for adding/removing const-ness
reinterpret_cast<new_type>(old_object) for interpreting memory in a different way
dynamic_cast<new_type>(old_object) for a bunch of conversions related to polymorphic class hierarchies

It also has a new syntax for C style casts — T(old_object) which looks like a constructor call, but may do all C style conversions — but let’s ignore C style casts, they do nothing that can’t be done with the C++ casts.

Of course, C++ being C++: This isn’t true, they can convert an object to a private base class. But you shouldn’t know that.

Of the four new C++ casts operation, I only like one. Can you guess which one?

Wrong, it’s reinterpret_cast.

“But why?”, you ask, “reinterpret_cast is an evil tool, you shouldn’t use that.”

This might be true, but reinterpret_cast only does one thing: It changes a pointer type. The other casts do multiple things at once.

To be fair, reinterpret_cast can also convert a pointer to an integer. But I wanted to say something positive about one cast.

Consider const_cast: It has two similar yet very different jobs — it can be used to add constness and to remove constness. The first is a completely harmless situation and used to help overload resolution sometimes. The second is a dangerous road to undefined behavior if you don’t know what you’re doing. Yet the two modes share the same function name!

C++17 adds std::add_const() as a harmless way of adding constness, which is good, but 20 years too late.

To be fair, I can count my number of const_cast uses with one hand, so it doesn’t really matter.

dynamic_cast is similar: Depending on the types it is used with, it can cast up the hierarchy, down the hierarchy, across entire classes or give you a void* to the most derived object. Those are separate functionality, so why move it all in to one? They should have been a up_cast, down_cast, cross_cast and get_most_derived_ptr functions instead.

But the worst of them is static_cast. It can be used to:

convert between integer types
convert between floating point types
convert between integer and floating point types
convert between void* and pointer types
convert between enum and its underlying integer type
convert between (not-to-complicated™) base and derived classes
convert an lvalue to an rvalue (std::move)
convert between any two types provided there is a suitable constructor or conversion operator

This is probably not an exhaustive list.

These are a lot of different conversions, some are narrowing (float → int), some are wide (T* → void*). Some are cheap (uint32_t → uint64_t), some are expensive (std::string_view → std::string). Just looking at the cast in the source code the semantics are impossible to know.

And more importantly, it is not really “static”.

In a way, this is only slightly better than an implicit conversion: It requires the writing programmer to say “yeah, go ahead”, but it doesn’t help the reading programmer much. A call to truncate<int>(my_float) or round<int>(my_float) is much more expressive than a static_cast<int>(float), especially for user-defined types.

As such I give this goal:

Following Sean Parent, this is not a guideline because it is not always possible to follow it and it is not as bad, if you don’t.

Don’t use static_cast: Write your own functions to do static_cast conversions, truncate, round, to_underlying(my_enum) etc. and use those instead. This especially true for user-defined types, see below.

Again, a consequence of the goal is this guideline:

Don’t use explicit constructors to implement conversions (and don’t use explicit conversion operators).

Of course, absolutely do use explicit! Just not where you actually intend a usage of the form static_cast<T>(my_obj).

A notable exception to that rule is explicit operator bool: It basically provides the sane implicit conversions, so if (foo) and !foo works, but i + foo doesn’t.

Implementation of user-defined conversions

So if not using explicit constructors, how should you add new non-implicit conversions?

Well, use a function that takes an object of the source type and returns a new object of the destination type. A function has one big benefit over a constructor or conversion operator: It has a name.

As seen above, you can use that name to provide useful contextual information:

Is this a narrow or wide conversion?
If it is narrow, what’s the behavior if an error occurs?
etc.

A bad name is static_cast<int>(my_float), a better name is gsl::narrow_cast<int>(my_float) — at least it informs that it is narrow, a good name is truncate<int>(my_float), because it also tells what it does in the error case.

The behavior in the error case is basically the only thing of a conversion function that is not obvious.

Note that a conversion function doesn’t need to have a prefix _cast. Only use it if there is no better name and/or it is a wide conversion where you don’t need to encode error information.

C++ Constructors

I have much more positive things to say about C++ constructors than C++ casts: After all, they’re the other half of the best feature in C++ — destructors.

So I’ll just repeat what others have said in this guideline:

Add a constructor to put an object in a valid, well-formed state: As such, it should take enough arguments to do that.

A “valid, well-formed state” is a state where the object is usable enough, you should be able to call the basic getter functions, for example.

However, this is just the bare minimum: You should also add other constructors to put the object in some convenient state.

Take this code, for example:

std::string str; // default constructor puts it into a well-formed state

// now set the actual contents
str = "Hello ";
str += std::to_string(42); // `std::to_string` is a cast, BTW

Something like this is definitely more convenient;

std::string str = "Hello " + std::to_string(42);

// str has the actual state already

However, following this to the extreme leads to something like this:

std::vector<int> vec(5, 2);

I actually don’t know what this does, I have to look it up every time.

Like with static_cast, there is no room to provide any additional information about the parameters. This is problem one with constructors.

The other one is this one: Suppose you’re creating some form of immutable object that needs to be initialized with a lot of state. You really shouldn’t pass in a ton of parameters to the constructor!

Only add constructors if the meaning of the parameters are clear and there aren’t too many parameters.

What should you do instead?

Well, there are two alternatives.

Named constructors

A named constructor is a free function or static member function that is used to construct the object. Again: you can give it a proper name!

For example, consider a file class. It has two main constructors: one that creates a new file and one that opens an existing one. However, both take just the file path, so it is even impossible to use constructors for it, as they cannot be overloaded!

But you can give them different names:

class file
{
public:
  static file open(const fs::path& p);
  static file create(const fs::path& p);
};

…

auto f1 = file::open(…);
auto f2 = file::create(…);

However, named constructors are not as ergonomic as regular constructors. You can’t use them with emplace(), for example.

A different implementation uses constructors and simply adds tags to give them names. Now they can be used with emplace like functions.

class file
{
public:
  static constexpr struct open_t {} open;
  file(open_t, const fs::path& p);

  static constexpr struct create_t {} create;
  file(create_t, const fs::path& p);
};

…

auto f1 = file(file::create, …);
auto f2 = file(file::open, …);

Which implementation of named constructor you use, is up to you. I tend to use the static function one more, but this is just my personal taste. You should definitely consider using one of both variants if you have complex constructors.

The builder pattern

If your constructors get too complex, the builder pattern helps. Instead of having just one creation function, you have an entire class: the builder. It contains many functions to set the different attributes and a finish() member function that returns the finalized object.

I use it for complex classes in cppast, because they are not mutable, so need to be completely created with all properties. Here is the cpp_class object, for example:

class cpp_class
{
public:
    class builder
    {
    public:
        // specify properties that always need to be provided
        explicit builder(std::string name, cpp_class_kind kind, bool is_final = false);

        // mark the class as final
        void is_final() noexcept;

        // add a base class
        cpp_base_class& base_class(std::string name, std::unique_ptr<cpp_type> type,
                                   cpp_access_specifier_kind access, bool is_virtual);


        // add a new access specifier
        void access_specifier(cpp_access_specifier_kind access);

        // add a child
        void add_child(std::unique_ptr<cpp_entity> child) noexcept;

        // returns the finished class
        std::unique_ptr<cpp_class> finish(const cpp_entity_index& idx, cpp_entity_id id,
                                          type_safe::optional<cpp_entity_ref> semantic_parent);

    private:
        std::unique_ptr<cpp_class> class_;
    };

    … // but no public constructors
};

Note that the builder pattern has a couple of advantages over “inlining” the setter functions into the class:

The class itself can be made immutable, it doesn’t need a lot of setters.
Members don’t need to be default constructible: The builder can store them as std::optional<T> or ts::deferred_construction<T> and assert in the finish() function that they have been set. Then the actual class object can be created.

One downside of the builder pattern is added verbosity. And if the created object is not polymorphic and returned by value, the nested class can’t simply have a member of the object is currently creating:

class foo
{
public:
    class builder
    {
        foo result_; // error: foo is an incomplete type at this point

        …
    };

    …
}:

To workaround that, either the builder must contain all members individually or must be defined outside the class:

class foo
{
public:
  class builder;

  …
};

class foo::builder
{
  foo result_; // okay

  …
};

But apart from those the builder pattern is a useful tool. However, it is only going to be used in rare situations.

Conclusion

When writing your own types, think about the constructors and cast operations you want to provide.

In particular:

Make single-argument constructors explicit and never use them for casting
Only add implicit conversions if you’re absolutely sure they’re necessary
Prefer to implement cast operations as suitable named non-member functions
Consider named constructors if the parameters are confusing
Consider the builder pattern if you have complex constructors

Also try to avoid static_cast, use specialized casting functions instead. They’re more readable as they clearly show what is done.

Following this rules, you have interfaces that are easier to use and make it more obvious what they do.

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.