My colleague PaulS asked me the following:
I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:
cover_point 
    = 
    [[data_type] identifier ':' ] 'coverpoint' identifier ';' 
    ;
data_type 
    = 
    'int' | 'float' | identifier 
    ;
identifier 
    = 
    ?/\w+/? 
    ;
The problem is that when parsing the following legal string:
anIdentifier: coverpoint another_identifier;
anIdentifier matches with data_type (via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.
I can re-write the rule as follows,
cover_point_rewrite  
    = 
    [data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';' 
    ;
but I wonder if:
Is this a PEG-in-general issue, or a tool (Grako) one?
It says here that in PEGs the choice operator is ordered to avoid CFGs ambiguities by using the first match.
In your first example
[data_type]succeeds parsing id, so it fails when it finds
: instead of another identifier.
That may be because [data_type] behaves like (data_type | ε) so it will always parse data_type with the first id.
In
[data_type identifier ':' | identifier ':' ]the first choice fails when there is no second id, so the parser backtracks and tries with the second choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With