I accidentally wrote a Turing-complete parsing library

I’m currently working on lexy, a C++ parsing DSL library: you describe how input should be parsed, and lexy generates code for it, taking care of error recovery, parse tree generation, and parse values. Such parser generators are classified based on the expressiveness of the corresponding formal language. For example, a strict regular expression can only parse regular languages, which is a strict subset of a deterministic context free language, and so on.

lexy, being essentially syntax sugar for a recursive descent parser with (manually specified!) arbitrary lookahead but no other state, falls in the latter category. Parsers in that category are unable to parse context-sensitive languages such as XML with matching tags. To handle them, I’ve added support for “context variables”: state that can be modified during parsing.

However, in a recent refactoring of the context variables implementation, I’ve accidentally removed a big limitation, which makes lexy Turing-complete: the parser is thus able to do arbitrary computation while parsing the input.

TL;DR: I’ve written a lexy grammar that is able to execute, not just parse, a simple Turing-complete language.

» read more »
Jonathan

Tutorial: the CRTP Interface Technique

Generic code expects that your types model certain concepts. Sometimes, the concept requires many redundant member functions in your type. A big culprit here are iterators: they require many operator overloads, most of which are trivially implemented in terms of other overloads.

CRTP, the curiously recurring template pattern, can help here and automate the boilerplate away. Let’s look at the CRTP interface technique and explore how it works.

» read more »
Jonathan

C++20 concepts are structural: What, why, and how to change it?

C++20 added concepts as a language feature. They’re often compared to Haskell’s type classes, Rust’s traits or Swift’s protocols.

Yet there is one feature that sets them apart: types model C++ concepts automatically. In Haskell, you need an instance, in Rust, you need an impl, and in Swift, you need an extension. But in C++? In C++, concepts are just fancy boolean predicates that check for well-formed syntax: every type that makes the syntax well-formed passes the predicate and thus models the concepts.

This was the correct choice, but is sometimes not what you want. Let’s explore it further.

» read more »
Jonathan

Tutorial: Interactive code snippets with Hugo and Compiler Explorer

I’m currently rewriting the documentation for lexy, my C++ parser combinator library – hey, this is the fourth blog post in a row mentioning it in the introduction! It already has an interactive online playground where you can enter a grammar and input and see the resulting parse tree and/or error messages. This is really helpful, so the new documentation will contain examples that are directly available in the playground, to try it out and see what happens.

While implementing this, I’ve realized that this can also be extended to ordinary code snippets on my blog. Without any Javascript or manual work on my part, I can give the code snippet a “play button” that directly links to Compiler Explorer, like this:

» read more »
Jonathan

Implementation Challenge: Lossless, compact parse tree with iterative traversal

My parser combinator library lexy was originally designed to parse some grammar into a user-defined data structure, comparable to Boost.Spirit. This is ideal for parsing simple “data” grammars like JSON or email addresses, and also works for parsing programming languages: simply parse into your AST. However, by design lexy::parse() will only forward data explicitly produced by the parsing combinators which does not include punctuation, comments, or whitespace.

Inspired by matklad’s blog post about modern parser generators, I’ve decided to add a way to retain all the information and produce a lossless parse tree by calling lexy::parse_as_tree(). This requires no changes to your existing grammar and simply switches the output. With that, I could also add an online playground that visualizes the parse tree of a given grammar on the given input.

Implementing the actual code that produces a parse tree during parsing wasn’t too hard – I’ve already had a handler that controls what happens during parsing to implement lexy::match() and lexy::validate(). The challenging part was the actual data structure for storing a parse tree: it should be memory-efficient, as it can be big, and users should be able to easily iterate over every node without requiring recursion.

» read more »
Jonathan