Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design of a python pickleable object that describes a file

Tags:

python

pickle

I would like to create a class that describes a file resource and then pickle it. This part is straightforward. To be concrete, let's say that I have a class "A" that has methods to operate on a file. I can pickle this object if it does not contain a file handle. I want to be able to create a file handle in order to access the resource described by "A". If I have an "open()" method in class "A" that opens and stores the file handle for later use, then "A" is no longer pickleable. (I add here that opening the file includes some non-trivial indexing which cannot be cached--third party code--so closing and reopening when needed is not without expense). I could code class "A" as a factory that can generate file handles to the described file, but that could result in multiple file handles accessing the file contents simultaneously. I could use another class "B" to handle the opening of the file in class "A", including locking, etc. I am probably overthinking this, but any hints would be appreciated.

like image 450
seandavi Avatar asked Nov 30 '25 17:11

seandavi


1 Answers

The question isn't too clear; what it looks like is that:

  • you have a third-party module which has picklable classes
  • those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.

Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:

import pickle
class PicklableFile(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj

    def __getattr__(self, key):
        return getattr(self.fileobj, key)

    def __getstate__(self):
        ret = self.__dict__.copy()
        ret['_file_name'] = self.fileobj.name
        ret['_file_mode'] = self.fileobj.mode
        ret['_file_pos'] = self.fileobj.tell()
        del ret['fileobj']
        return ret

    def __setstate__(self, dict):
        self.fileobj = open(dict['_file_name'], dict['_file_mode'])
        self.fileobj.seek(dict['_file_pos'])
        del dict['_file_name']
        del dict['_file_mode']
        del dict['_file_pos']
        self.__dict__.update(dict)

f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()

Caveats and notes, some obvious, some less so:

  • This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
  • If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
  • If it's a different file, the behavior may or may not make sense.
  • If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
  • If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
  • The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.

I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.

like image 195
Glenn Maynard Avatar answered Dec 03 '25 09:12

Glenn Maynard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!