Memory 0.6: Composition and Joint Allocators

21 Nov 2016 by Jonathan

If you’ve been a long reader of my blog, you might remember my memory library. I haven’t forgotten about it, even though the 0.5 release was in February! After three patches and a long pause in development to focus on standardese, I’ve finally finished the 0.6 release. It mainly provides two major features: composition and joint allocators.

foonathan/memory is a library providing various memory allocators and adapter classes. Those allocators use a new RawAllocator concept that is simpler than STL’s Allocator and allows better control over the allocation aspect. Adapters and traits ensure compatibility with the existing model, allowing usage in STL or other containers.

Composition

Andrei’s talk made the idea of compositioning allocators quite popular. He proposed a library where you have many allocator “building blocks” and you can plug them together to make powerful combinations.

Thanks to my BlockAllocator concept - check out the 0.5 release post or my Meeting C++ talk for infos about it, you can already combine some allocators. For example, you can use my virtual_block_allocator to create a memory_stack that is virtual memory aware.

But this is not the kind of composition he described. In his library he could, for example, wrote a fallback_allocator. It is an adapter that takes two allocators. It first tries the first one and if that fails, it uses the second allocator.

But if the allocation of a RawAllocator fails, it must not return nullptr. So checking whether it failed will boil down to catching the exception it throws instead. This is slow (and only works when the library is compiled with exception support), but there is an even bigger problem: The deallocation. It must know from which allocator the memory came and deallocate it there. This is not supported for the current RawAllocator, because it can’t be supported for all allocators: For new_allocator - a wrapper over ::operator new, how can it detect whether the memory was allocated by it in the deallocation?

Instead, I’ve added a new concept, a composable RawAllocator. This is a RawAllocator that also provides try_allocate_node/array and try_deallocate_node/array functions. The try allocation functions return nullptr on failure, instead of throwing an exception/aborting/… The try deallocate function checks whether the memory came from the allocation, and only deallocates it if it did. It returns true if it could deallocate, false otherwise.

All allocators that can be composable are now composable. This allows implementing the fallback_operator:

void* fallback_allocator::allocate_node(std::size_t size, std::size_t alignment)
{
    // first try default
    auto ptr = get_default_allocator()
                 .try_allocate_node(size, alignment);
    if (!ptr)
        // default was not successful
        // this is not composable, so guaranteed to be succesful
        ptr = get_fallback_allocator()
                 .allocate_node(size, alignment);
    return ptr;
}

void fallback_allocator::deallocate_node(void* ptr,
        std::size_t size, std::size_t alignment) noexcept
{
    // first try default
    auto res = get_default_allocator()
                   .try_deallocate_node(ptr,
                                        size, alignment);
    if (!res)
        // could not be allocated by default
        get_fallback_allocator()
            .deallocate_node(ptr, size, alignment);
}

The actual implementation uses the new composable_allocator_traits instead of directly calling the member functions.

In addition to fallback_allocator, I’ve also implemented segregator.

segregator does not require a composable allocator, only fallback_allocator does that.

That is an allocator adapter taking one or more Segregatables and a RawAllocator. A Segregatable is a simple class that owns an allocator and can decide for each allocation whether this allocator should be used. The most basic Segregatable is the threshold_segregatable. It handles allocation up to a given max size.

The segregator now ask each Segregatable in turn if it wants that allocation. It uses the first one that does. If no Segregatable wants it, it uses the RawAllocator for the allocation:

auto seg = memory::make_segregator(memory::threshold(16u, std::move(small_alloc)),
                                   memory::threshold(128u, std::move(medium_alloc)),
                                   std::move(big_alloc));
seg.allocate_node(8, 4); // uses small_alloc
seg.allocate_node(32, 8); // uses medium alloc
seg.allocate_node(4_KiB, 8); // uses big_alloc

threshold() is a simple function that creates a threshold_segregatable.

I’ve also added the null_allocator: The allocator that allocates nothing, where every call results in an exception. It is useful for segregator: Pass it as final RawAllocator to ensure that at least some Segregatable handles it.

Joint memory allocations

I’ve also added facilities for joint memory allocations inspired by this great post. Consider the following type:

struct my_type
{
    std::string str;
    std::vector<int> vec;

    my_type(const char* name)
    : str(name), vec({1, 2, 3, 4, 5})
    {}
};

Now consider what happens when you dynamically allocate it: The constructor of std::string and std::vector will (“might” for you pedantic people) also allocate dynamic memory. Even if you use an allocator for the dynamic allocation, it still does two more!

This is where joint allocations become useful. The idea is that you allocate a bigger memory block than needed for the object itself, and use the additional memory - the “joint memory” - for the dynamic allocation of the members.

With the facilities I’ve implemented in memory, this is very easy:

struct my_type : memory::joint_type<my_type>
{
    memory::string<memory::joint_allocator> str;
    memory::joint_array<int> vec;

    my_type(memory::joint tag, const char* name)
    : memory::joint_type<my_type>(tag),
      str(name, *this),
      vec({1, 2, 3, 4, 5}, *this)
    {}
};

We have to change my_type for it though. The first thing to do is to inherit from memory::joint_type. This base will insert two pointers for managing the joint memory.

Then each member with dynamic allocations must use the joint_allocator in order to use the joint memory. joint_allocator is a RawAllocator that will use the joint memory of a given object for dynamic memory allocation. In this case we use it with std::string.

memory::string<RawAllocator> is just a template alias for std::basic_string<char, std::char_traits<char>, memory::std_allocator<char, RawAllocator>>. It is not a new string type or anything like that.

Because the memory::joint_allocator has a bit of overhead - an additional pointer to be precise, there is also memory::joint_array<T>. This is a dynamic fixed size array, i.e. a std::vector<T> that cannot grow. It is designed to use joint memory and has no overhead.

All constructors for the joint type must also take an object of memory::joint as first parameter. This object has two jobs: First, it can only be created by friends, so it prohibits accidental creation of joint types without joint memory. Second, it contains meta data about the joint memory and needs to be passed to the joint_type.

Because of the custom allocators, we have to pass an allocator to the objects. This is simple *this, the object with the joint memory.

It is very important that you always pass *this as allocator, not some other object.

In order to create a joint type we use the allocate_joint function:

auto ptr = memory::allocate_joint<my_type>
                            (memory::default_allocator{},
                             memory::joint_size(…),
                             "joint!");
                             
std::cout << ptr->str << '\n';
for (auto& el : *ptr)
    std::cout << el << ' ';
std::cout << '\n';

The function takes the allocator used for the - single! - allocation, the size of the joint memory and additional arguments passed to the types constructor. The size has the type memory::joint_size which is explicitly convertible from a std::size_t. The only downside with joint memory is the manual calculation of the size beforehand. When doing so, one also has to keep alignment buffers in mind. If the size isn’t enough, it will throw an exception.

The return type of allocate_joint is memory::joint_ptr<T, RawAllocator>. It behaves similar to std::unique_ptr<T>, but owns the entire joint memory block and will deallocate it when it goes out of scope.

For more information, check out the example.

About Allocator propagation

In my first real blog post I’ve talked about how the STL Allocator model has these propagate_on_XXX typedefs. These control whether the allocator will be copy/move assigned/swapped when the container is copy/move assigned/swapped. The select_on_container_copy_construction() member function controls what happens on container copy construction, move construction cannot be customized.

In that post I said that the defaults of no propagation are bad, as they can lead to performance pessimization, undefined and un-intuitive behavior. I proposed that you should always change the defaults so container assignment will also assign the allocator.

After the blog post I got an e-mail from Alisdair Meredith who designed that part of the allocator model. He explained the reasons behind the choices, mainly because of containers where the allocator is shared with the members. I wrote more about it in this blog post. I wasn’t quite convinced why this was necessary, but didn’t run into the situation myself, so I did not comment it further.

But with the joint allocations, I did run into the situation. Consider what happens when we have two joint objects and assign them:

auto a = memory::allocate_joint<my_type>(…);
auto b = memory::allocate_joint<my_type>(…);

*a = *b;

This will assign all members, so also the str container. str uses a joint_allocator inside the std_allocator adapter that allows using RawAllocators in STL containers. The default propagation choice inside the std_allocator is always propagate containers, which was the guideline I made in the original post.

So the assignment operator of the container will assign the allocator from a->str to the allocator used by b->str. The str object from a will use the allocator using joint memory from b! b might not have enough memory to start with, but imagine b getting destroyed before a. This will also destroy bs memory, so a now uses destroyed memory.

This is bad, so propagation isn’t the right choice here. We don’t want that the allocator gets assigned when the container gets assigned - similar for swap. As swapping two containers with unequal allocators is undefined behavior, this forbids swaps between containers of different joint memory, only swapping between members of a joint object is allowed.

The same issue exists with copy construction. If we write the copy constructor of my_type like so:

my_type(memory::joint tag, const joint_type& other)
: memory::joint_type<my_type>(tag),
  str(other.str),
  vec(other.vec)
{}

Remember, each constructor must take memory::joint as first parameter.

str will copy the allocator from other.str, so it will use the joint memory from other instead of *this. You have to use the copy constructor version that takes an allocator:

str(other.str, *this) // copy construct str using *this as allocator

Luckily, copy construction calls select_on_container_copy_construction(), so by putting a static_assert() inside there we can stop this code from compiling. Sadly, there is no select_on_container_move_construction(), so you have to watch out there.

memory::joint_array does not provide the regular copy/move constructors, so is safe to use.

In order to control the propagation behavior by the std_allocator, I’ve put the default behavior into the propagation_traits. They can be specialized for own RawAllocators and control the propagation behavior of std_allocator.

Minor features

In addition to those two major features, I’ve implemented a couple of smaller ones.

Block size literals

If you’re using any arena allocator (like memory::memory_pool, memory::memory_stack,…), you often create them like so:

memory::memory_pool<> pool(16, 4096);

The 4096 is the initial size of the arena, so 4KiB. For convenience, I’ve added user defined literals for those, so now you can write:

using namespace memory::literals;
memory::memory_pool<> pool(16, 4_KiB);

The header memory_arena.hpp now provides user defined literals for KiB, MiB and GiB going multiple of 1024 and KB, MB and GB going multiple of 1000. They simple return a std::size_t.

`temporary_allocator` improvements

The temporary_allocator is a facility for temporary allocations. It uses a global, thread-local stack to allow fast allocations.

In this update the stack became public as temporary_stack and the creation can now be controlled. The macro FOONATHAN_MEMORY_TEMPORARY_STACK_MODE can be set two 0, 1 or 2.

0 means that there will not be any stack created automatically, you have to crate a temporary_stack object yourself in a top-level function and pass that down.

With 1 there is one per-thread stack available by calling get_temporary_stack(), but it will not be destroyed automatically. For that you have to use the temporary_stack_initializer class, create on object in a top-level function, the destructor will destroy the stack.

And with 2 the stack will be destroyed automagically, but with a slight runtime overhead. You can still use temporary_stack_initializer though, but it is not required anymore.

Stack allocator additions

I’ve added memory_stack_raii_unwind which does exactly what you think it does, as well as iteration_allocator.

iteration_allocator is designed if you do many allocations in a loop, where each allocation must live for N iterations and can then be destroyed. This is a generalization of the double-frame allocator. It consists of N memory stacks internally and switches between them on each iteration. If it cycles back to a stack, it will clear it and free all it’s memory:

// creates it with 2 stacks,
// each one using 2KiB memory
memory::iteration_allocator<2> alloc(4_KiB);

while (…)
{
    auto mem = alloc.allocate(…);
    // mem now lives for two iterations
    
    …

    // switch stacks
    alloc.next_iteration(); 
}

Conclusion

This update also comes with OS X support and many bugfixes.

The documentation is currently still using Doxygen, but as standardese, is almost at a point where I can use it, I will soon transfer it over and also improve the documentation.

In the meantime, you can also check out the slides for my Meeting C++ talk about it and try the library. The next update will probably tackle per-thread allocators and will most likely be the last 0.x version.

As always: I appreciate any feedback, feature requests, etc., so feel free to contact me!

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.