Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python AST to traverse code and extract return statements

I have a python script that I am trying to "decode" as it were so that I can cast it as an xml, but for this exercise I am just trying to get my head around using ast.walk() and how best to get info out of it. Below is a very simple function that I wrote to simple_if.py and what I am trying to do is extract the function name and the return statements as a list in order.

def if_else(abc):
    if abc > 0:
        return "Hello"
    elif abc < 0:
        return "Goodbye"
    return "Neither Hello Nor Goodbye"

When I put the above into the file and run the below ast functions on it like so

with open("simple_if.py", "r", encoding = "utf-8") as ast_tree_walker
    tree = ast.parse(ast_tree_walker.read())

expList = []
counter = 0

for node in ast.walk(tree):
    print(node)
    if isInstance(node, ast.FunctionDef) and counter == 0:
        expList.append(node.__dict__["name"]
        counter = 1 # this adds the functiona name if_else to the list 
    print(node.__dict__)

The above would give a terminal output of nodes, dicts, and args, but the only thing I have been able to accomplish is extracting the function name since it is the first non-Module node in the tree. I know I am dumping everything that I "want" but its not clear to me how I should keep track of the ordering for instance "Neither Hello Nor Goodbye" would appear before either of "Hello" or "Goodbye" and although I assume that is because in terms of a tree the final return is on the same level as the if statement it is not clear to me how I can maintain order (if at all). Basically does anyone know how I could return a list of ["if_else", "Hello", "Goodbye", "Neither Hello Nor Goodbye"]? On some level I feel this is a fools errand, but in the pursuit of knowledge I am wondering how to go about this.

Nuanced question the above prints out types of:

<ast.Module object at (random token string that I won't write)>, <ast.FunctionDef object at >, <ast.If object at >, <ast.Return object at >... 

When I come across a "<ast.If object at >" should I step into the 'body' key directly or somehow just wait till that child node gets visited and then extract the info?

#children of ast.If
{'test': <ast.Compare object at>, 'body" <ast.Return object at >, 'orelse': [<ast.If object at>, 'lineno':2, 'col_offset':7}
like image 338
user3782816 Avatar asked Oct 29 '25 10:10

user3782816


1 Answers

Although the documentation claims that ast.walk generates nodes "in no specified order" and even misleadingly characterizes its behavior as "Recursively yield all descendant nodes...", a quick look at the source code reveals that it really performs a simple breadth-first traversal by iteratively yielding child nodes in a queue:

def walk(node):
    """
    Recursively yield all descendant nodes in the tree starting at *node*
    (including *node* itself), in no specified order.  This is useful if you
    only want to modify nodes in place and don't care about the context.
    """
    from collections import deque
    todo = deque([node])
    while todo:
        node = todo.popleft()
        todo.extend(iter_child_nodes(node))
        yield node

So the node for return "Neither Hello Nor Goodbye" is yielded before the node for return "Hello" because the former is an immediate child of def if_else(...) while the latter is a child of its child.

To produce the nodes in your desired depth-first order you can write your own recursive node traversal function instead:

import ast

def dfs_walk(node):
    yield node
    for child in ast.iter_child_nodes(node):
        yield from dfs_walk(child)

so that:

tree = ast.parse('''\
def if_else(abc):
    if abc > 0:
        return "Hello"
    elif abc < 0:
        return "Goodbye"
    return "Neither Hello Nor Goodbye"
''')

expList = []
for node in dfs_walk(tree):
    if isinstance(node, ast.FunctionDef):
        expList.append(node.name)
    elif isinstance(node, ast.Return):
        expList.append(node.value.value)
print(expList)

outputs:

['if_else', 'Hello', 'Goodbye', 'Neither Hello Nor Goodbye']

Demo here

like image 198
blhsing Avatar answered Oct 31 '25 00:10

blhsing