standardese documentation generator version 0.3: Groups, inline documentation, template mode & more

26 Nov 2016 by Jonathan

After two bugfix release for the parsing code, I finally got around to implement more features for standardese. A complete refactoring of the internal code allowed me to implement some advanced features: standardese now comes with member groups, the ability to show inline documentation, a template language and many minor things that just improve the overall documentation generation.

standardese is a documentation generator specifically designed for C++ code. It supports and detects many idioms for writing C++ documentation. It aims to be a replacement of Doxygen.

Yet again an update on the parsing situation

I’m using libclang for the parsing, but because it has many limitations, I’m forced to run my own parser over the tokens of each entity to get the required information.

But because libclang’s tokenizer does not preprocess the tokens, I’ve used Boost.Wave to preprocess the tokens, then parse them. But this leads to problems if you have source entities that are generated by a macro, like in the following example:

#define MAKE_STRUCT(name) \
struct name \
{ \
    int a; \
};

MAKE_STRUCT(foo)
MAKE_STRUCT(bar)

When parsing foo or bar, I’ll get the tokens of the macro, instead of the expanded tokens. Because I do not want to influence the way you write C++ code, I was forced to do something else.

This is one of standardese main goals: There should be no need to adapt your code to standardese. It should just support everything out of the box. That’s why it provides ways to completely modify the synopsis of each entity, etc.

In the 0.2-2 patch, I’ve changed the preprocessing code, so that Boost.Wave preprocesses the entire file, then parse that with libclang. That why I do not have to worry about any preprocessing.

But Boost.Wave is slow and also can’t handle many of the extensions used by the standard library headers, so I got a lot of workarounds there.

In this version I finally replaced Boost.Wave and now I use clang for the preprocessing.

I literally use clang, I call the binary from the code with the -E flag to give the preprocess output and parse that. I know that this is a bad solution, but it is just a temporary solution until I find a proper library for preprocessing.

But let’s talk about interesting features.

Member groups

You often have code that looks like this:

class foo
{
public:

    …

    /// \returns A reference to the variable.
    T& get_variable()
    {
        return var_;
    }

    /// \returns A reference to the variable.
    const T& get_variable() const
    {
        return var_;
    }
};

Multiple functions do practically the same thing but have slightly different signatures. It would be very tedious to repeat the documentation over and over again.

With member groups you don’t have to:

class foo
{
public:
    /// \returns A reference to the variable.
    /// \group get_variable
    T& get_variable()
    {
        return var_;
    }

    /// \group get_variable
    const T& get_variable() const
    {
        return var_;
    }
};

The \group command adds an entity to a member group. As the name implies, this only works for entities that are member of the same class/namespace/etc.. The group name is just an internal identifier for the group and only needs to be unique in that scope.

The first entity with a new group identifier, is the main entity for the group: It’s comment will be taken for the group comment and it’s type defines the header used for the group. With groups the output will look like this:

Function `foo::get_variable`

(1)    T& get_variable();

(2)    const T& get_variable() const;

Returns: A reference to the variable.

This is similar to the way cppreference.com does its documentation.

Modules

I’ve also added modules as a way to group related entities together. The \module command adds an entity to a module, it can be in at most one module and will be passed on to all children. For example, if you do it in a namespace, it will add all entities in that namespace to that module.

The module will be shown in the documentation of the entity by default - can be controlled by the output.show_modules command - and a new index file standardese_modules will list all modules with all entities in each module.

They are useful if you have multiple logical components in your project and want to give a quick overview.

Entity linking improvements

Inside a comment there are two syntax for linking to a different entity:

[some text](<> "unique-name") (CommonMark link without URL but with title)
[unique-name]() (CommonMark link without URL)

The unique-name is the unique identifier of the entity you want to refer to. The correct URL will be filled in by standardese.

The unique-name can also refer to an external entity, for example by default std::XXX, will create link to the corresponding C++ reference page. This can be customized and extended by the comment.external_doc option.

Now I’ve added a third syntax: [some-text](standardese://unique-name/), i.e. a CommonMark link with an URL in the standardese:// protocol. Like with the other two options, standardese will fill in the URL automatically.

This syntax was mainly added for the template mode, see below.

But a problem with that linking model was that the unique-name is verbose:

// unique name is: ns
namespace ns
{
    // unique name is: ns::foo(void*)
    // unique name of param is: ns::foo(void*).param
    void foo(void* param);

    // unique name is: ns::bar<T>
    template <typename T> // unique name of `T` is: ns::bar<T>.T
    struct bar
    {
        // unique name is: ns::bar<T>::f1()
        void f1();
    
        // unique name is: ns::bar<T>::f2()
        void f2();
    };
}

While you don’t need the signature for functions that are not overloaded, and while you can rename the unique name to an arbitrary string with the \unique_name command, this is still verbose. For example if you want to link from f2() to f1(), you had to type: [ns::bar<T>::f1()]().

Now I’ve added a link mode with name lookup. Simply start the unique name with * or ? and standardese will search for an entity with rules similar to the regular C++ name lookup. So with that you can simply link to f1() from f2() by writing: [*f2()]().

Name lookup only works in comments associated with a C++ entity and will be done from that C++ entity. It does not work in template files, for example.

Inline documentation

The documentation for some entities will now be shown inline by default. This applies to parameters, member variables of a struct, enum values or base classes. Previously if you document them, standardese would add a new section for them, repeat their synopsis, etc.

Enumeration `foo`

enum class foo
{
    a,
    b,
    c
};

An enum.

Enumeration constant `foo::a`

The value a.

Enumeration constant `foo::b`

The value b.

Enumeration constant `foo::c`

The value c.

Struct `bar`

struct bar
{
    int a;
};

A struct.

Variable `bar::a`

int a;

Some variable.

Function `func`

void func(int a);

A function.

Parameter `func::a`

int a

A parameter.

Now they can be shown inline, in a little list:

Enumeration `foo`

enum class foo
{
    a,
    b,
    c
};

An enum.

Enum values:

a - The value a.
b - The value b.
c - The value c.

Struct `bar`

struct bar
{
    int a;
};

A struct.

Members:

a - Some variable.

Function `func`

void func(int a);

A function.

Parameters:

a - A parameter.

It goes without saying that links to those entities will resolve to correct list position, right?

Other improvements

There are many smaller things.

You can now completely control the synopsis of an entity with the \synopsis command. Simply set the synopsis to an arbitrary string that will be shown instead of the actual synopsis. Previously you could only hide, for example, certain parameters of a function.

The headings are now improved. Previously it only showed the type of the entity: Function bar(), Constructor foo(const foo&). Now it detects certain signatures and give them more semantic meaning: Copy constructor foo(const foo&), Comparison operator operator==, etc.

The “definition” of a macro can now be hidden from the synopsis by the global output.show_macro_replacement option. This is useful as macro definitions are often implementation details.

There are also a few breaking changes: To do a hard line break in a comment, you cannot use the CommonMark backslash at the end of a line anymore, you have to use a forward slash instead (this is a technical limitation). The \entity and \file commands for remote comments must now be in the beginning of a comment and not at an arbitrary position. Also the unique name of function templates got simplified: you must not pass the template parameters there anymore.

But let’s address the biggest and most powerful feature: template mode.

Template mode

standardese now also works as a basic templating language. If you pass in files that are not header files, they will be preprocessed. This does two things: correctly linking all URLs in the standardese:// protocol and replacing of special commands.

This can be best shown by an example. Consider the following C++ input file:

/// Struct a.
struct a {};

/// A function.
void func();

/// Struct b.
struct b {};

A non-source file input like this one:

### A heading

This file is in Markdown format, but you can use *anything* you want.
standardese doesn't care about the format,
it just does dumb text manipulation.

I can link to [the function](standardese://func()/) and it will be resolved.
But I can also show output of standardese here:

{ { standardese_doc_synopsis func() commonmark } }

This line will be replaced with the synopsis of `func()` in the commonmark format.
But it can be more advanced:

{ { standardese_for $entity file.hpp } }
    { { standardese_if $entity name func() } }
    { { standardese_else } }
       *  { { standardese_doc_text $entity commonmark } }
    { { standardese_end } }
{ { standardese_end } }

This will show the documentation text of the two structs.

Note: I had to add spaces between the { { and } }, because Jekyll was parsing them. The actual syntax does not use those spaces, and standardese will silently ignore any commands not starting with standardese_, so it works nice with it.

Pass both files to standardese and it will create the regular documentation for the C++ file as well as preprocess the template file to this:

A heading

This file is in Markdown format, but you can use anything you want. standardese doesn’t care about the format, it just does dumb text manipulation.

I can link to the function (manual edit: link doesn’t work here obviously) and it will be resolved. But I can also show output of standardese here:

void func();

This line will be replaced with the synopsis of func() in the CommonMark format. But it can be more advanced:

   *  Struct a.

   *  Struct b.

This will show the documentation text of the two structs.

This is useful if you want to write additional files, like tutorials. But with the --template.default_template you can pass a file that will customize the entire output. If you pass none it will behave like this:

{ { standardese_doc $file $format } }

Again, in reality no spaces.

$file will refer to the current file, $format to the specified output format. This will render the documentation for each file as standardese would do it. Check out the readme for a quick template syntax overview.

But if you want to use additional files, you’d love the standardese_doc_anchor command. With the standardese:// protocol you can link to parts of the generated documentation. But with the anchor command, you can link back:

{ { standardese_doc_anchor unique-name <format> } }

Without the spaces.

This will create an anchor in the file. But the unique-name will be registered, so you can use it as a link target inside the documentation comments!

If the unique-name already exists, this will change the link for that entity. With it you can override where the “actual” documentation is.

The template language is currently very basic and the error messages if you mess up are bad, but its already worth it and will be improved in the future.

What’s next?

With this release, standardese is at a point where I’m going to migrate Doxygen documentation to it. But I’ll continue working on it. I have many features planned and I might already start tackling with automated comment generation based on the code alone.

If you want to see a live demo, check out my Meeting C++ Lightning Talk. You can get the tool from the Github page, read the readme for more information.

This blog post was written for my old blog design and ported over. If there are any issues, please let me know.

Yet again an update on the parsing situation

Member groups

Function foo::get_variable

Modules

Entity linking improvements

Inline documentation

Enumeration foo

Enumeration constant foo::a

Enumeration constant foo::b

Enumeration constant foo::c

Struct bar

Variable bar::a

Function func

Parameter func::a

Enumeration foo

Struct bar

Function func

Other improvements

Template mode

A heading

What’s next?

Function `foo::get_variable`

Enumeration `foo`

Enumeration constant `foo::a`

Enumeration constant `foo::b`

Enumeration constant `foo::c`

Struct `bar`

Variable `bar::a`

Function `func`

Parameter `func::a`

Enumeration `foo`

Struct `bar`

Function `func`