I'm trying to write PyParsing code capable of parsing any Python code (I know that the AST module exists, but that will just be a starting point - I ultimately want to parse more than just Python code.)
Anyways, I figure I'll just start by writing something able to parse the classic
print("Hello World!")
So here's what I wrote:
from pyparsing import (alphanums, alphas, delimitedList, Forward,
quotedString, removeQuotes, Suppress, Word)
expr = Forward()
string = quotedString.setParseAction(removeQuotes)
call = expr + Suppress('(') + Optional(delimitedList(expr)) + Suppress(')')
name = World(alphas + '_', alphanums + '_')
expr <<= string | name | call
test = 'print("Hello World!")'
print(expr.parseString(test))
When I do that, though, it just spits out:
['print']
Which is technically a valid expr - you can type that into the REPL and there's no problem parsing it, even if it's useless.
So I thought maybe what I would want is to flip around name and call in my expr definition, so it would prefer returning calls to names, like this:
expr <<= string | call | name
Now I get a maximum recursion depth exceeded error. That makes sense, too:
expr.
string, it's not.call.
expr, return to start of outer list.So my question is... how can I define call and expr so that I don't end up with an infinite recursion, but also so that it doesn't just stop when it sees the name and ignore the arguments?
Is Python code too complicated for PyParsing to handle? If not, is there any limit to what PyParsing can handle?
(Note - I've included the general tags parsing, abstract-syntax-tree, and bnf, because I suspect this is a general recursive grammar definition problem, not something necessarily specific to pyparsing.)
Your grammar is left recursive: expr expects a call which expects an expr which expects a call... If PyParsing can't handle left recursion, you need to change the grammar to something that PyParsing can work with.
One approach to remove direct left recursion is to change a gramar rule such us:
A = A b | c
into
A = c b*
In your case, left recursion is indirect: it doesn't happen in expr, but in a sub rule (call):
E = C | s | n
C = E x y z
To remove indirect left recursion you usually "lift" the definition of the sub-rule to the main rule. Unfortunatelly this removes the offending sub rule from the grammar -- in other words, you lose some structural expressiveness when you do that.
The previous example, with indirect recursion removed, would look like this:
E = E x y z | s | n
At this point, you have direct left recursion, which is easier to transform. When you deal with that, the result would be something like this -- in pseudo EBNF:
E = (s | n) (x y z)*
In your case, the definition of Expr would become:
Expr = (string | name) Args*
Args = "(" ExprList? ")"
ExprList = Expr ("," Expr)*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With