Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing macro-style class method with a decorator?

I'm having a lot of trouble getting a good grasp on decorators despite having read many an article on the subject (including [this][1] very popular one on SO). I'm suspecting I must be stupid, but with all the stubbornness that comes with being stupid, I've decided to try to figure this out.

That, and I suspect I have a good use case...

Below is some code from a project of mine that extracts text from PDF files. Processing involves three steps:

  1. Set up PDFMiner objects needed for processing of PDF file (boilerplate initializations).
  2. Apply a processing function to the PDF file.
  3. No matter what happens, close the file.

I recently learned about context managers and the with statement, and this seemed like a good use case for them. As such, I started by defining the PDFMinerWrapper class:

class PDFMinerWrapper(object):
    '''
    Usage:
    with PDFWrapper('/path/to/file.pdf') as doc:
        doc.dosomething()
    '''
    def __init__(self, pdf_doc, pdf_pwd=''):
        self.pdf_doc = pdf_doc
        self.pdf_pwd = pdf_pwd

    def __enter__(self):
        self.pdf = open(self.pdf_doc, 'rb')
        parser = PDFParser(self.pdf)  # create a parser object associated with the file object
        doc = PDFDocument()  # create a PDFDocument object that stores the document structure
        parser.set_document(doc)  # connect the parser and document objects
        doc.set_parser(parser)
        doc.initialize(self.pdf_pwd)  # pass '' if no password required
        return doc

    def __exit__(self, type, value, traceback):
        self.pdf.close()
        # if we have an error, catch it, log it, and return the info
        if isinstance(value, Exception):
            self.logError()
            print traceback
            return value

Now I can easily work with a PDF file and be sure that it will handle errors gracefully. In theory, all I need to do is something like this:

with PDFMinerWrapper('/path/to/pdf') as doc:
    foo(doc)

This is great, except that I need to check that the PDF document is extractable before applying a function to the object returned by PDFMinerWrapper. My current solution involves an intermediate step.

I'm working with a class I call Pamplemousse which serves as an interface to work with the PDFs. It, in turn, uses PDFMinerWrapper each time an operation must be performed on the file to which the object has been linked.

Here is some (abridged) code that demonstrates its use:

class Pamplemousse(object):
    def __init__(self, inputfile, passwd='', enc='utf-8'):
        self.pdf_doc = inputfile
        self.passwd = passwd
        self.enc = enc

    def with_pdf(self, fn, *args):
        result = None
        with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
            if doc.is_extractable:  # This is the test I need to perform
                # apply function and return result
                result = fn(doc, *args)

        return result

    def _parse_toc(self, doc):
        toc = []
        try:
            toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
        except PDFNoOutlines:
            pass
        return toc

    def get_toc(self):
        return self.with_pdf(self._parse_toc)

Any time I wish to perform an operation on the PDF file, I pass the relevant function to the with_pdf method along with its arguments. The with_pdf method, in turn, uses the with statement to exploit the context manager of PDFMinerWrapper (thus ensuring graceful handling of exceptions) and executes the check before actually applying the function it has been passed.

My question is as follows:

I would like to simplify this code such that I do not have to explicitly call Pamplemousse.with_pdf. My understanding is that decorators could be of help here, so:

  1. How would I implement a decorator whose job would be to call the with statement and execute the extractability check?
  2. Is it possible for a decorator to be a class method, or must my decorator be a free-form function or class?
like image 730
Louis Thibault Avatar asked Nov 19 '25 23:11

Louis Thibault


1 Answers

The way I interpreted you goal, was to be able to define multiple methods on your Pamplemousse class, and not constantly have to wrap them in that call. Here is a really simplified version of what it might be:

def if_extractable(fn):
    # this expects to be wrapping a Pamplemousse object
    def wrapped(self, *args):
        print "wrapper(): Calling %s with" % fn, args
        result = None
        with PDFMinerWrapper(self.pdf_doc) as doc:
            if doc.is_extractable:
                result = fn(self, doc, *args)
        return result
    return wrapped


class Pamplemousse(object):

    def __init__(self, inputfile):
        self.pdf_doc = inputfile

    # get_toc will only get called if the wrapper check
    # passes the extractable test
    @if_extractable
    def get_toc(self, doc, *args):
        print "get_toc():", self, doc, args

The decorator if_extractable is defined is just a function, but it expects to be used on instance methods of your class.

The decorated get_toc, which used to delegate to a private method, simply will expect to receive a doc object and the args, if it passed the check. Otherwise it doesn't get called and the wrapper returns None.

With this, you can keep defining your operation functions to expect a doc

You could even add some type checking to make sure its wrapping the expected class:

def if_extractable(fn):
    def wrapped(self, *args):
    if not hasattr(self, 'pdf_doc'):
        raise TypeError('if_extractable() is wrapping '\
                        'a non-Pamplemousse object')
    ...
like image 115
jdi Avatar answered Nov 21 '25 13:11

jdi