Extra Meal Writing Macros in the Dumbest Way Possible in Macro Programming

Supplemental: Macro Programming (Part 1): Writing Macros in the Simplest Way #

Hello, I am Chen Tian.

After the previous lesson, I believe you should understand why in [lecture 6] we mentioned that the essence of macros is quite simple. Leaving aside quote/unquote, the main job in macro programming is to convert one syntax tree to another. Going deeper, it’s just a transformation from one data structure to another.

So, how exactly are macros in Rust capable of performing such transformations?

Next, let’s try to build declarative macros and procedural macros together. I hope that through the process of writing your own macros, you can grasp the thought process and methods for operating data transformations in building macros. Once you have mastered this method, you can handle almost any problems related to macro programming.

How to Construct Declarative Macros #

First, let’s see how declarative macros are created.

We create a new project using cargo new macros --lib, then in the newly generated project, create an examples directory, and add examples/rule.rs (code):

#[macro_export]
macro_rules! my_vec {
    // A my_vec with no arguments, here we'll create an empty vec
    () => {
        std::vec::Vec::new()
    };
    // Handle my_vec![1, 2, 3, 4]
    ($($el:expr),*) => ({
        let mut v = std::vec::Vec::new();
        $(v.push($el);)*
        v
    });
    // Handle my_vec![0; 10]
    ($el:expr; $n:expr) => {
        std::vec::from_elem($el, $n)
    }
}

fn main() {
    let mut v = my_vec![];
    v.push(1);
    // When invoking, you can use [], (), {}
    let _v = my_vec!(1, 2, 3, 4);
    let _v = my_vec![1, 2, 3, 4];
    let v = my_vec! {1, 2, 3, 4};
    println!("{:?}", v);

    println!("{:?}", v);
    //
    let v = my_vec![1; 10];
    println!("{:?}", v);
}

As mentioned in the previous lesson, declarative macros can be generated with macro_rules!. macro_rules uses pattern matching, so you can provide multiple matching conditions and corresponding blocks of code to execute for each condition.

Looking at this code, we’ve written three matching rules.

The first () => (std::vec::Vec::new()) is easy to understand: if no arguments are passed in, it creates a new Vec. Note that since macros will expand at the location they’re called from, we cannot predict if the caller’s environment has already performed the relevant use; therefore, the code we use is best equipped with the full namespace.

This second matching condition ($($el:expr),*) requires a detailed introduction.

In declarative macros, parameters to capture in conditions are declared using the symbols \\(. Each parameter also needs a type, where expr stands for an expression, so \\(el:expr) means to name the matched expression as \\(el). \\)(...),* tells the compiler it can match any number of expressions separated by commas and then access each captured expression with $el.

Since we matched one $(...)* (we can ignore the separator), we also need to use $(...)* to expand correspondingly in the block of code to execute. Therefore, this $(v.push($el);)* matches as many $el as needed and expands to that many push statements.

Understanding the second condition makes the third one easy to get: if two expressions are passed in separated by a colon, then from_element will be used to construct Vec.

When using declarative macros, we need to explicitly define the types of parameters, and which types are available is summarized here:

  • item, such as a function, struct, module, etc.
  • block, a block of code. For instance, a series of expressions and statements wrapped in curly braces.
  • stmt, a statement. For example, an assignment statement.
  • pat, a pattern.
  • expr, an expression. This was used in the example just now.
  • ty, a type. Like Vec.
  • ident, an identifier. Such as a variable name.
  • path, a path. Like foo, ::std::mem::replace, transmute::<_, int>.
  • meta, metadata. Generally, it’s the data inside #[...] and #![...] attributes.
  • tt, a single token tree.
  • vis, a possibly empty Visibility modifier. Like pub, pub(crate).

Constructing declarative macros is very straightforward, just follow their basic syntax and you can quickly convert a function or some repetitive statement fragments into declarative macros.

For example, when dealing with pipelines, based on the result of some expressions that return Result, I often match them as shown in the code below, in order to return PipelineError, an enum, instead of Result upon an error:

match result {
    Ok(v) => v,
    Err(e) => {
        return pipeline::PlugResult::Err {
            ctx,
            err: pipeline::PipelineError::Internal(e.to_string()),
        }
    }
}

However, this method might appear repeatedly within the same function, and we can’t encapsulate it using a function. So, we can implement it using declarative macros to significantly simplify the code:

#[macro_export]
macro_rules! try_with {
    ($ctx:ident, $exp:expr) => {
        match $exp {
            Ok(v) => v,
            Err(e) => {
                return pipeline::PlugResult::Err {
                    ctx: $ctx,
                    err: pipeline::PipelineError::Internal(e.to_string()),
                }
            }
        }
    };
}

How to Construct Procedural Macros #

Next, let’s talk about how to build procedural macros.

Procedural macros are much more complex than declarative macros, but regardless of the type, the essence is the same: it involves processing the input TokenStream into the output TokenStream.

To construct a procedural macro, you need to build a separate crate and add the declaration of proc-macro in Cargo.toml:

[lib]
proc-macro = true

Only then will the compiler allow you to use #[proc_macro] related macros. So we first add this declaration to Cargo.toml of the crate we generated today, then enter the following code in lib.rs:

use proc_macro::TokenStream;

#[proc_macro]
pub fn query(input: TokenStream) -> TokenStream {
    println!("{:#?}", input);
    "fn hello() { println!(\"Hello world!\"); }"
        .parse()
        .unwrap()
}

This code first declares it as a proc_macro, and it’s the most basic, functional procedural macro.

It can be called by the user through query!(...). We print the passed-in TokenStream, then parse a piece of code contained within a string into a TokenStream to return. It’s very convenient to obtain a TokenStream by using the string’s parse() method because TokenStream implements the FromStr trait, thanks to Rust.

Now that we understand what this code does, let’s write an example to give it a try. Create examples/query.rs and enter the following code:

use macros::query;

fn main() {
    query!(SELECT * FROM users WHERE age > 10);
}

You can see that, although SELECT * FROM user WHERE age > 10 is not a valid Rust syntax, the Rust lexer still parsed it into a TokenStream for the query macro.

Run cargo run --example query to see the printout of the input TokenStream in the query macro:

TokenStream [
    Ident {
        ident: "SELECT",
        span: #0 bytes(43..49),
    },
    Punct {
        ch: '*',
        spacing: Alone,
        span: #0 bytes(50..51),
    },
    Ident {
        ident: "FROM",
        span: #0 bytes(52..56),
    },
    Ident {
        ident: "users",
        span: #0 bytes(57..62),
    },
    Ident {
        ident: "WHERE",
        span: #0 bytes(63..68),
    },
    Ident {
        ident: "age",
        span: #0 bytes(69..72),
    },
    Punct {
        ch: '>',
        spacing: Alone,
        span: #0 bytes(73..74),
    },
    Literal {
        kind: Integer,
        symbol: "10",
        suffix: None,
        span: #0 bytes(75..77),
    },
]

Here, TokenStream is an Iterator, containing a series of TokenTree:

pub enum TokenTree {
    Group(Group),
    Ident(Ident),
    Punct(Punct),
    Literal(Literal),
}

The last three are Ident (identifier), Punct (punctuation), and Literal (literal). Here, Group (group) is because if your code contains parentheses, such as {} [] <> (), the contents inside will be parsed into a Group (group). You can also try changing the call to query! in the example like this:

query!(SELECT * FROM users u JOIN (SELECT * from profiles p) WHERE u.id = p.id and u.age > 10);

Then run cargo run --example query again to see what the TokenStream looks like now, and whether it includes a Group.

Good, now we have a concept of the input TokenStream, so what is the output TokenStream used for? Our query! macro returned a TokenStream of the hello() function, which can actually be called directly, right?

You can try adding a call to hello() in the main() and rerun this example—you’ll see the long-missed “Hello world!” printout.

Congratulations! Your first procedural macro is completed!

Although it’s not an outstanding result, through it, we recognized basic procedural macro writing practices and the basic structures of TokenStream/TokenTree.

Next, let’s attempt to implement a derive macro, which is one of the three types of procedural macros that are most meaningful to everyone and the type of macro you would mainly use in procedural macros at work.

How to Construct Derive Macros #

We expect to build a Builder derive macro, to achieve the following requirements found in proc-macro-workshop (proc-macro-workshop is an exercise created by Rust expert David Tolnay to help everyone better learn macro programming):

#[derive(Builder)]
pub struct Command {
    executable: String,
    args: Vec<String>,
    env: Vec<String>,
    current_dir: Option<String>,
}

fn main() {
    let command = Command::builder()
        .executable("cargo".to_owned())
        .args(vec!["build".to_owned(), "--release".to_owned()])
        .env(vec![])
        .build()
        .unwrap();
    assert!(command.current_dir.is_none());

    let command = Command::builder()
        .executable("cargo".to_owned())
        .args(vec!["build".to_owned(), "--release".to_owned()])
        .env(vec![])
        .current_dir("..".to_owned())
        .build()
        .unwrap();
    assert!(command.current_dir.is_some());
}

As you can see, just by providing Builder macro for the Command structure, it supports builder() method, returning a CommandBuilder structure, which has several methods with the same names as the fields within Command. We can use these methods in a chain, eventually building a Command structure with build().

We create an examples/command.rs and add this part of the code to it. Obviously, it cannot be compiled successfully. We will first manually write the corresponding code to see what a complete code that allows main() to run correctly looks like:

#[allow(dead_code)]
#[derive(Debug)]
pub struct Command {
    executable: String,
    args: Vec<String>,
    env: Vec<String>,
    current_dir: Option<String>,
}

#[derive(Debug, Default)]
pub struct CommandBuilder {
    executable: Option<String>,
    args: Option<Vec<String>>,
    env: Option<Vec<String>>,
    current_dir: Option<String>,
}

impl Command {
    pub fn builder() -> CommandBuilder {
        Default::default()
    }
}

impl CommandBuilder {
    pub fn executable(mut self, v: String) -> Self {
        self.executable = Some(v.to_owned());
        self
    }

    pub fn args(mut self, v: Vec<String>) -> Self {
        self.args = Some(v.to_owned());
        self
    }

    pub fn env(mut self, v: Vec<String>) -> Self {
        self.env = Some(v.to_owned());
        self
    }

    pub fn current_dir(mut self, v: String) -> Self {
        self.current_dir = Some(v.to_owned());
        self
    }

    pub fn build(mut self) -> Result<Command, &'static str> {
        Ok(Command {
            executable: self.executable.take().ok_or("executable must be set")?,
            args: self.args.take().ok_or("args must be set")?,
            env: self.env.take().ok_or("env must be set")?,
            current_dir: self.current_dir.take(),
        })
    }
}

fn main() {
    let command = Command::builder()
        .executable("cargo".to_owned())
        .args(vec!["build".to_owned(), "--release".to_owned()])
        .env(vec![])
        .build()
        .unwrap();
    assert!(command.current_dir.is_none());

    let command = Command::builder()
        .executable("cargo".to_owned())
        .args(vec!["build".to_owned(), "--release".to_owned()])
        .env(vec![])
        .current_dir("..".to_owned())
        .build()
        .unwrap();
    assert!(command.current_dir.is_some());
    println!("{:?}", command);
} 

This code is very simple; it’s basically hand-written according to the usage methods in main(). You can see many repetitive parts in the code, especially the methods in CommandBuilder, which are what we can use macros to automate the generation of.

So how do you generate such code? Obviously, we need to extract information from the input TokenStream, meaning we need to extract the name of each field and its type from within the struct definition and then generate the corresponding method code.

If we consider code as strings, it is not difficult to imagine that we are actually generating the results we want through a template and the corresponding data. Generating HTML with a template might be familiar to many, but generating Rust code through a template is probably a new experience for you.

With this idea in mind, let’s try to use jinja to write a template for generating the CommandBuilder structure. In Rust, we have the askma library, which is very efficient at handling jinja. The template might look something like this:

#[derive(Debug, Default)]
pub struct {{ builder_name }} {
    {% for field in fields %}
    {{ field.name }}: Option<{{ field.ty }}>,
    {% endfor %}
}

Here, fields/builder_name is what we need to pass as parameters, and each field requires two properties: name and ty, which correspond to the field’s name and type. We can also generate methods for this structure:

impl {{ builder_name }} {
    {% for field in fields %}
    pub fn {{ field.name }}(mut self, v: impl Into<{{ field.ty }}>) -> {{ builder_name }} {
        self.{{ field.name }} = Some(v.into());
        self
    }
    {% endfor %}

    pub fn build(self) -> Result<{{ name }}, &'static str> {
        Ok({{ name }} {
            {% for field in fields %}
            {% if field.optional %}
            {{ field.name }}: self.{{ field.name }},
            {% else %}
            {{ field.name }}: self.{{ field.name }}.ok_or("Build failed: missing {{ field.name }}")?,
            {% endif %}
            {% endfor %}
        })
    }
}

To avoid generating Option for fields that were initially of type Option, we’ll need to separately extract whether it’s an Option. If it is, then ty is T. Thus, field will require an additional attribute, optional.

Armed with this idea, we can construct our own data structures to describe Field:

#[derive(Debug, Default)]
struct Fd {
    name: String,
    ty: String,
    optional: bool,
}

Now with a template and data structures defined to provide data for the template, the core question we need to deal with next is: how do we extract the information we want from the TokenStream?

With this question in mind, let’s add a derive macro in lib.rs and print its input:

#[proc_macro_derive(RawBuilder)]
pub fn derive_raw_builder(input: TokenStream) -> TokenStream {
    println!("{:#?}", input);
    TokenStream::default()
}

For derive macros, the proce_macro_derive macro must be used. We name this derive macro RawBuilder. In examples/command.rs, we modify the Command structure to use RawBuilder (be sure to use macros::RawBuilder):

use macros::RawBuilder;

#[allow(dead_code)]
#[derive(Debug, RawBuilder)]
pub struct Command {
    ...
}

After running this example, we’ll see a formidable printout of TokenStream (which is quite long, so I won’t paste it here). Reading through this printout carefully, you can see:

  • It begins with a Group, which includes the information of the #[allow(dead_code)] attribute. Since we now have information below the derive, so all attributes that don’t belong to #[derive(...)] will be put into the TokenStream.
  • Next are pub/struct/Command, three idents.
  • Afterward is another Group, containing information about each field. You’ll notice, fields are separated by commas, with Punct, and the name and type of each field are again separated by the colon Punct. The type may be an Ident, such as String