I have a soup from BeautifulSoup that I cannot pickle. When I try to pickle the object the python interpreter silently crashes (such that it cannot be handled as an exception). I have to be able to pickle the object in order to return the object using the multiprocessing package (which pickles objects to pass them between processes). How can I troubleshoot/work around the problem? Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. I have tried to isolate the problem by looping over the soup and pickling individual components, the smallest thing that produces the error is <class 'BeautifulSoup.NavigableString'>. When I print the object it prints out u'\n'.
The class NavigableString is not serializable with pickle or cPickle, which multiprocessing uses. You should be able to serialize this class with dill, however. dill has a superset of the pickle interface, and can serialize most of python. multiprocessing will still fail, unless you use a fork of multiprocessing which uses dill, called pathos.multiprocessing.
Get the code here: https://github.com/uqfoundation.
For more information see: What can multiprocessing and dill do together?
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
http://nbviewer.ipython.org/gist/minrk/5241793
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With