Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse Python Code using PyParsing?

I'm trying to write PyParsing code capable of parsing any Python code (I know that the AST module exists, but that will just be a starting point - I ultimately want to parse more than just Python code.)

Anyways, I figure I'll just start by writing something able to parse the classic

print("Hello World!")

So here's what I wrote:

from pyparsing import (alphanums, alphas, delimitedList, Forward,
                       quotedString, removeQuotes, Suppress, Word)

expr = Forward()
string = quotedString.setParseAction(removeQuotes)
call = expr + Suppress('(') + Optional(delimitedList(expr)) + Suppress(')')
name = World(alphas + '_', alphanums + '_')
expr <<= string | name | call

test = 'print("Hello World!")'
print(expr.parseString(test))

When I do that, though, it just spits out:

['print']

Which is technically a valid expr - you can type that into the REPL and there's no problem parsing it, even if it's useless.

So I thought maybe what I would want is to flip around name and call in my expr definition, so it would prefer returning calls to names, like this:

expr <<= string | call | name

Now I get a maximum recursion depth exceeded error. That makes sense, too:

  1. Checks if it's an expr.
    1. Checks if it's a string, it's not.
    2. Checks if it's a call.
      1. It must start with an expr, return to start of outer list.

So my question is... how can I define call and expr so that I don't end up with an infinite recursion, but also so that it doesn't just stop when it sees the name and ignore the arguments?

Is Python code too complicated for PyParsing to handle? If not, is there any limit to what PyParsing can handle?

(Note - I've included the general tags parsing, abstract-syntax-tree, and bnf, because I suspect this is a general recursive grammar definition problem, not something necessarily specific to pyparsing.)

like image 568
ArtOfWarfare Avatar asked Apr 25 '26 05:04

ArtOfWarfare


1 Answers

Your grammar is left recursive: expr expects a call which expects an expr which expects a call... If PyParsing can't handle left recursion, you need to change the grammar to something that PyParsing can work with.

One approach to remove direct left recursion is to change a gramar rule such us:

A = A b | c

into

A = c b*

In your case, left recursion is indirect: it doesn't happen in expr, but in a sub rule (call):

E = C | s | n
C = E x y z

To remove indirect left recursion you usually "lift" the definition of the sub-rule to the main rule. Unfortunatelly this removes the offending sub rule from the grammar -- in other words, you lose some structural expressiveness when you do that.

The previous example, with indirect recursion removed, would look like this:

E = E x y z | s | n

At this point, you have direct left recursion, which is easier to transform. When you deal with that, the result would be something like this -- in pseudo EBNF:

E = (s | n) (x y z)*

In your case, the definition of Expr would become:

Expr = (string | name) Args*
Args = "(" ExprList? ")"
ExprList = Expr ("," Expr)*
like image 108
Branco Medeiros Avatar answered Apr 27 '26 17:04

Branco Medeiros