I have an object gui_project which has an attribute .namespace, which is a namespace dict. (i.e. a dict from strings to objects.)
(This is used in an IDE-like program to let the user define his own object in a Python shell.)
I want to pickle this gui_project, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace dict) are not picklable objects. For example, some of them refer to wxPython widgets.
I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.
How can I do this?
(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)
(I do implement a GuiProject.__getstate__ method right now, to get rid of other unpicklable stuff besides namespace.)
Generally you can pickle any object if you can pickle every attribute of that object. Classes, functions, and methods cannot be pickled -- if you pickle an object, the object's class is not pickled, just a string that identifies what class it belongs to.
In general, pickling a dict will fail unless you have only simple objects in it, like strings and integers. Even a really simple dict will often fail. It just depends on the contents.
“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Pickle dump replaces current file data.
I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.
http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects
ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.
You could start with something like this (untested):
import cPickle
def persistent_id(obj):
    if isinstance(obj, wxObject):
        return "filtered:wxObject"
    else:
        return None
class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)
def persistent_load(obj_id):
    if obj_id.startswith('filtered:'):
        return FilteredObject(obj_id[9:])
    else:
        raise cPickle.UnpicklingError('Invalid persistent id')
def dump_filtered(obj, file):
    p = cPickle.Pickler(file)
    p.persistent_id = persistent_id
    p.dump(obj)
def load_filtered(file)
    u = cPickle.Unpickler(file)
    u.persistent_load = persistent_load
    return u.load()
Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.
You could make the solution more generic by filtering out objects that are not of the built-in types and have no __getstate__ method.
Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.
from cPickle import Pickler, Unpickler, UnpicklingError
class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)
class MyPickler(object):
    def __init__(self, file, protocol=0):
        pickler = Pickler(file, protocol)
        pickler.persistent_id = self.persistent_id
        self.dump = pickler.dump
        self.clear_memo = pickler.clear_memo
    def persistent_id(self, obj):
        if not hasattr(obj, '__getstate__') and not isinstance(obj,
            (basestring, int, long, float, tuple, list, set, dict)):
            return "filtered:%s" % type(obj)
        else:
            return None
class MyUnpickler(object):
    def __init__(self, file):
        unpickler = Unpickler(file)
        unpickler.persistent_load = self.persistent_load
        self.load = unpickler.load
        self.noload = unpickler.noload
    def persistent_load(self, obj_id):
        if obj_id.startswith('filtered:'):
            return FilteredObject(obj_id[9:])
        else:
            raise UnpicklingError('Invalid persistent id')
if __name__ == '__main__':
    from cStringIO import StringIO
    class UnpickleableThing(object):
        pass
    f = StringIO()
    p = MyPickler(f)
    p.dump({'a': 1, 'b': UnpickleableThing()})
    f.seek(0)
    u = MyUnpickler(f)
    obj = u.load()
    print obj
    assert obj['a'] == 1
    assert isinstance(obj['b'], FilteredObject)
    assert obj['b'].about
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With