Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert BNF grammar to pyparsing

How can I describe a grammar using regex (or pyparsing is better?) for a script languge presented below (Backus–Naur Form):

<root>   :=     <tree> | <leaves>
<tree>   :=     <group> [* <group>] 
<group>  :=     "{" <leaves> "}" | <leaf>;
<leaves> :=     {<leaf>;} leaf
<leaf>   :=     <name> = <expression>{;}

<name>          := <string_without_spaces_and_tabs>
<expression>    := <string_without_spaces_and_tabs>

Example of the script:

{
 stage = 3;
 some.param1 = [10, 20];
} *
{
 stage = 4;
 param3 = [100,150,200,250,300]
} *
 endparam = [0, 1]

I use python re.compile and want to divide everything in groups, something like this:

[ [ 'stage',       '3'],
  [ 'some.param1', '[10, 20]'] ],

[ ['stage',  '4'],
  ['param3', '[100,150,200,250,300]'] ],

[ ['endparam', '[0, 1]'] ]

Updated: I've found out that pyparsing is much better solution instead of regex.

like image 429
Max Tkachenko Avatar asked Jan 13 '15 16:01

Max Tkachenko


1 Answers

Pyparsing lets you simplify some of these kinds of constructs

leaves :: {leaf} leaf

to just

OneOrMore(leaf)

So one form of your BNF in pyparsing will look something like:

from pyparsing import *

LBRACE,RBRACE,EQ,SEMI = map(Suppress, "{}=;")
name = Word(printables, excludeChars="{}=;")
expr = Word(printables, excludeChars="{}=;") | quotedString

leaf = Group(name + EQ + expr + SEMI)
group = Group(LBRACE + ZeroOrMore(leaf) + RBRACE) | leaf
tree = OneOrMore(group)

I added quotedString as an alternative expr, in case you wanted to have something that did include one of the excluded chars. And adding Group around leaf and group will maintain the bracing structure.

Unfortunately, your sample doesn't quite conform to this BNF:

  1. spaces in [10, 20] and [0, 1] make them invalid exprs

  2. some leafs do not have terminating ;s

  3. lone * characters - ???

This sample does parse successfully with the above parser:

sample = """
{
 stage = 3;
 some.param1 = [10,20];
}
{
 stage = 4;
 param3 = [100,150,200,250,300];
}
 endparam = [0,1];
 """

parsed = tree.parseString(sample)    
parsed.pprint()

Giving:

[[['stage', '3'], ['some.param1', '[10,20]']],
 [['stage', '4'], ['param3', '[100,150,200,250,300]']],
 ['endparam', '[0,1]']]
like image 72
PaulMcG Avatar answered Sep 20 '22 15:09

PaulMcG