My parser combinator library lexy was originally designed to parse some grammar into a user-defined data structure, comparable to Boost.Spirit.
This is ideal for parsing simple “data” grammars like JSON or email addresses, and also works for parsing programming languages: simply parse into your AST.
However, by design lexy::parse()
will only forward data explicitly produced by the parsing combinators which does not include punctuation, comments, or whitespace.
Inspired by matklad’s blog post about modern parser generators, I’ve decided to add a way to retain all the information and produce a lossless parse tree by calling lexy::parse_as_tree()
.
This requires no changes to your existing grammar and simply switches the output.
With that, I could also add an online playground that visualizes the parse tree of a given grammar on the given input.
Implementing the actual code that produces a parse tree during parsing wasn’t too hard – I’ve already had a handler that controls what happens during parsing to implement lexy::match()
and lexy::validate()
.
The challenging part was the actual data structure for storing a parse tree:
it should be memory-efficient, as it can be big, and users should be able to easily iterate over every node without requiring recursion.
» read more »