Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ambiguity of PEG grammar with PEST parser

Tags:

rust

peg

I'm trying to write a PEG for an old file format that has about 100 keywords which can't be used as identifiers.

Here's an example of a keyword rule:

IN = { ^"in" } // Caret means case insensitivity

keyword = { IN } // plus others

The identifier rule looks like this:

identifier = @{ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }

Currently this identifier rule will match all the keywords. So the identifier rule becomes:

identifier = @{ !keyword ~ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }

This kind of works, except when an identifier begins with the same letters as a keyword. For example, the identifier inner is treated as the keyword in followed by text.

How to allow identifiers that begin with keywords? Note that in the PEST parser generator, terminals can only be specified as strings, not regex.

like image 200
oorst Avatar asked Oct 24 '25 09:10

oorst


1 Answers

You can force keyword to only match full words by using a predicate. For example:

identifier_start = _{ "_" | ASCII_ALPHA }
identifier_continue = _{ "_" | ASCII_ALPHANUMERIC }

keyword = @{ (^"for" | ^"in") ~ !identifier_continue }
identifier = @{ !keyword ~ identifier_start ~ identifier_continue* ~ !identifier_continue }

This will match for and in, but not form or int.

like image 78
L. F. Avatar answered Oct 26 '25 04:10

L. F.