Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust proc_macro and syn: parse whitespace

I'm trying to write a proc macro block! that parses syntax vaguely resembling jsx/rsx using syn crate. The following should be parsed into a block "wrapper" with two children blocks "text_1" and "text_2":

let block = block! {
    <"wrapper">
      <"text_1">("hello")
      <"text_2">("world")
};

Using input.parse::<syn::Token![]>() when implementing syn::parse I can parse everything alright.

However there's one thing that I can't find a solution to: I want to enforce the style of code for wrapper-children relation so that if some block is nested it has to have double space or a tabulation before it e.g.

// -- Good --
let block = block! {
  <"text_1">("hello")
  <"text_2">("world")
};

// -- Good --
let block = block! {
  <"wrapper">
    <"text_1">("hello")
    <"text_2">("world")
};

// -- Bad --
// Need spaces/tabs before the children tags
let block = block! {
  <"wrapper">
  <"text_1">("hello")
  <"text_2">("world")
};

Parsing TokenStream by Token![<], parse::<LitStr>(), Token![>], parenthesized! and bracketed! doesn't enforce intended rules, but I can't find anything related to parsing tabs/spaces in syn docs.

Judging by TokenStream printed in console - it doesn't even have anything between wrapper's > and < of the first child

Input: TokenStream [
    Punct {
        ch: '<',
        spacing: Alone,
        span: #0 bytes(118..119),
    },
    Literal {
        kind: Str,
        symbol: "wrapper",
        suffix: None,
        span: #0 bytes(119..128),
    },
    Punct {
        ch: '>',
        spacing: Alone,
        span: #0 bytes(128..129),
    },
    Punct {
        ch: '<',
        spacing: Alone,
        span: #0 bytes(144..145),
    },
    Literal {
        kind: Str,
        symbol: "text_1",
        suffix: None,
        span: #0 bytes(145..153),
    },
    Punct {
        ch: '>',
        spacing: Alone,
        span: #0 bytes(153..154),
    },

Any advise on how it's possible? If possible at all

like image 985
Slava.In Avatar asked Dec 07 '25 07:12

Slava.In


1 Answers

Indentation is not significant in Rust's syntax and whitespace is only required occasionaly to separate what would be a single token into two tokens (think let name). So because its only functional purpose is to help parse tokens, whitespace is not itself a token.

If your desired macro syntax relies on indentation, then you should use a string literal instead:

let block = block!(r#"
    <"wrapper">
        <"text_1">("hello")
        <"text_2">("world")
"#);

However, there is a hacky way to get what you want since Spans associated with tokens have a .source_text() method. Here's a simple macro that takes the source and turns it into a literal:

use proc_macro::{Literal, TokenStream, TokenTree};

#[proc_macro]
pub fn block(tokens: TokenStream) -> TokenStream {
    let source = tokens
        .into_iter()
        .next()
        .unwrap()
        .span()
        .source_text()
        .unwrap();

    TokenTree::Literal(Literal::string(&source)).into()
}
use macros::block;

fn main() {
    let block = block! { {
        <"wrapper">
          <"text_1">("hello")
          <"text_2">("world")
    } };

    dbg!(block);
}
[src/main.rs:10] block = "{\n        <\"wrapper\">\n          <\"text_1\">(\"hello\")\n          <\"text_2\">(\"world\")\n    }"

You could then parse it by lines and compare the leading whitespace within yuour procedural macro. Do notice though that I added an extra { } since a TokenStream itself doesn't have an encompassing .span() available.

You should also heed the documentation that this is not an intended use-case:

Returns the source text behind a span. This preserves the original source code, including spaces and comments. It only returns a result if the span corresponds to real source code.

Note: The observable result of a macro should only rely on the tokens and not on this source text. The result of this function is a best effort to be used for diagnostics only.

I do not encourage you to actually do this. And as much good intentions I'm sure you have, a macro should not be enforcing style.

like image 97
kmdreko Avatar answered Dec 08 '25 19:12

kmdreko