How can I speed up unpickling large objects if I have plenty of RAM?

Question

It's taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file).

Note that the file quickly loads into memory. In other words, if I run:

import cPickle as pickle

f = open("bigNetworkXGraph.pickle","rb")
binary_data = f.read() # This part doesn't take long
graph = pickle.loads(binary_data) # This takes ages

How can I speed this last operation up?

Note that I have tried pickling the data both in using both binary protocols (1 and 2), and it doesn't seem to make much difference which protocol I use. Also note that although I am using the "loads" (meaning "load string") function above, it is loading binary data, not ascii-data.

I have 128gb of RAM on the system I'm using, so I'm hoping that somebody will tell me how to increase some read buffer buried in the pickle implementation.

Tejas Shah · Accepted Answer

I had great success in reading a ~750 MB igraph data structure (a binary pickle file) using cPickle itself. This was achieved by simply wrapping up the pickle load call as mentioned here

Example snippet in your case would be something like:

import cPickle as pickle
import gc

f = open("bigNetworkXGraph.pickle", "rb")

# disable garbage collector
gc.disable()

graph = pickle.load(f)

# enable garbage collector again
gc.enable()
f.close()

This definitely isn't the most apt way to do it, however, it reduces the time required drastically.
(For me, it reduced from 843.04s to 41.28s, around 20x)

wump · Answer

You're probably bound by Python object creation/allocation overhead, not the unpickling itself. If so, there is little you can do to speed this up, except not creating all the objects. Do you need the entire structure at once? If not, you could use lazy population of the data structure (for example: represent parts of the structure by pickled strings, then unpickle them only when they are accessed).

How can I speed up unpickling large objects if I have plenty of RAM?

Tags:

python

serialization

pickle

conradlee

2 Answers

Tejas Shah

wump

Recent Activity

Donate For Us

How can I speed up unpickling large objects if I have plenty of RAM?

Tags:

python

serialization

pickle

conradlee

2 Answers

Tejas Shah

wump

Related questions

Recent Activity

Donate For Us