Tutorial: Preparing libraries for CMake FetchContent

If you’re working on an executable project in C++, as opposed to a C++ library, using a package manager to get your dependencies might be overkill: If all you need is to get the source code of a library, include in your CMake project, and have it compiled from source with the rest of your project, CMake’s FetchContent module can do it for you.

If you’re a library writer, there are ways you can structure your CMake project to improve the experience for end users that use FetchContent: hide developer targets like tests, provide a zip archive that contains only the source files relevant downstream, and use GitHub actions to create it automatically.

Let’s see how.

» read more »
Jonathan

Technique: Recursive variants and boxes

There are many data structures that can be elegantly expressed using sum types. In C++ a (somewhat clunky) implementation of sum types is std::variant. However, it can’t handle recursive data structures, where one alternative contains the entire sum type again.

Let’s see how we can fix that.

» read more »
Jonathan

saturating_add vs. saturating_int – new function vs. new type?

Suppose you want to do integer arithmetic that saturates instead of overflowing. The built-in operator+ doesn’t behave that way, so you need to roll something yourself. Do you write a saturating_add() function or a new saturating_int type with overloaded operator+? What about atomic_load(x) vs. atomic<int> x? Or volatile_store(ptr, value) vs. volatile int*?

When should you provide functions that implement new behavior and when should you write a wrapper type? Let’s look at the pro and cons.

» read more »
Jonathan

Technique: Compile Time Code Generation and Optimization

C++ constexpr is really powerful. In this blog post, we’ll write a compiler that can parse a Brainfuck program given as string literal, and generate optimized assembly instructions that can then be executed at runtime. The best part: we neither have to actually generate assembly nor optimize anything ourselves! Instead we trick the compiler into doing all the hard work for us.

The same technique can be used whenever you want to specify some sort of “program” in a different way and translate it at runtime: regexes, routing tables, etc.

» read more »
Jonathan

I accidentally wrote a Turing-complete parsing library

I’m currently working on lexy, a C++ parsing DSL library: you describe how input should be parsed, and lexy generates code for it, taking care of error recovery, parse tree generation, and parse values. Such parser generators are classified based on the expressiveness of the corresponding formal language. For example, a strict regular expression can only parse regular languages, which is a strict subset of a deterministic context free language, and so on.

lexy, being essentially syntax sugar for a recursive descent parser with (manually specified!) arbitrary lookahead but no other state, falls in the latter category. Parsers in that category are unable to parse context-sensitive languages such as XML with matching tags. To handle them, I’ve added support for “context variables”: state that can be modified during parsing.

However, in a recent refactoring of the context variables implementation, I’ve accidentally removed a big limitation, which makes lexy Turing-complete: the parser is thus able to do arbitrary computation while parsing the input.

TL;DR: I’ve written a lexy grammar that is able to execute, not just parse, a simple Turing-complete language.

» read more »
Jonathan