We're using pandas Dataframe as our primary data container for our time series data. We pack the dataframe into binary blobs into a mongoDB document for storage along with keys for meta data about the time series blob.
We ran into an error when we upgraded from pandas 0.14.1 to 0.15.2.
Create binary blob of pandas Dataframe (0.14.1)
import lz4   
import cPickle
bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))
Error Case: Read back in from mongoDB with pandas 0.15.2
cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))
Success Case: Read back in from mongoDB with pandas 0.14.1 with no error.
This seems to be similar to an old stack thread Pandas compiled from source: default pickle behavior changed With a helpful comment from https://stackoverflow.com/users/644898/jeff
The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."
Any ideas on workaround or solutions?
To recreate error:
Setup in pandas 0.14.1 env:
df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error
Create error in pandas 0.15.2 env:
cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))
Because of the number of changes to Pandas 1.0, some of Pandas's APIs are now backwards-incompatible. This includes changes to the behaviors of many common elements: The DataFrame type. pandas.
Pandas DataFrame: to_pickle() functionThe to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file. By default, infers from the file extension in specified path.
pickle saves the dataframe in it's current state thus the data and its format is preserved. This can lead to massive performance increases.
append was deprecated because: "Series. append and DataFrame. append [are] making an analogy to list. append, but it's a poor analogy since the behavior isn't (and can't be) in place.
This was explicity mentioned as the Index class now no-longer sub-classes ndarray but a pandas object, see here.
You simply need to use pd.read_pickle to read the pickles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With