Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get python function source excluding the docstring?

Tags:

python

You might want to have the docstring not affect the hash for example like in joblib memory.

Is there a good way of stripping the docstring? inspect.getsource and inspect.getdoc kind of fight each other: the docstring is "cleaned" in one.

like image 886
mathtick Avatar asked Oct 15 '25 04:10

mathtick


2 Answers

In case anyone is still looking for a solution for this, this is how I managed to build it:

from ast import Constant, Expr, FunctionDef, Module, parse
from inspect import getsource
from textwrap import dedent
from types import FunctionType
from typing import cast


def get_source_without_docstring(obj: FunctionType) -> str:
    # Get cleanly indented source code of the function
    obj_source = dedent(getsource(obj))

    # Parse the source code into an Abstract Syntax Tree.
    # The root of this tree is a Module node.
    module: Module = parse(obj_source)

    # The first child of a Module node is FunctionDef node that represents
    # the function definition. We cast module.body[0] to FunctionDef for type safety.
    function_def = cast(FunctionDef, module.body[0])

    # The first statement of a function could be a docstring, which in AST
    # is represented as an Expr node. To remove the docstring, we need to find
    # this Expr node.
    first_stmt = function_def.body[0]

    # Check if the first statement is a docstring (a constant str expression)
    if (
        isinstance(first_stmt, Expr)
        and isinstance(first_stmt.value, Constant)
        and isinstance(first_stmt.value.value, str)
    ):
        # Split the original source code by lines
        code_lines: list[str] = obj_source.splitlines()

        # Delete the lines corresponding to the docstring from the list.
        # Note: We are using 0-based list index, but the line numbers in the
        # parsed AST nodes are 1-based. So, we need to subtract 1 from the
        # 'lineno' property of the node.
        del code_lines[first_stmt.lineno - 1 : first_stmt.end_lineno]

        # Join the remaining lines back into a single string
        obj_source = "\n".join(code_lines)

    # Return the source code of function without docstrings
    return obj_source

Note: code by myself, comments by OpenAI's GPT

like image 116
Leandro Lima Avatar answered Oct 17 '25 16:10

Leandro Lima


If you just want to hash the body of a function, regardless of the docstring, you can use the function.__code__ attribute.

It gives access to a code object which is not affected by the docstring.

unfortunately, using this, you will not be able to get a readable version of the source

def foo():
    """Prints 'foo'"""
    print('foo')


print(foo.__doc__)  # Prints 'foo'
print(foo.__code__.co_code)  # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'
foo.__doc__ += 'pouet'
print(foo.__doc__)  # Prints 'foo'pouet
print(foo.__code__.co_code)  # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'
like image 42
Tryph Avatar answered Oct 17 '25 18:10

Tryph



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!