In some cases, when I load an existing pickle file, and after that dump it again, the size is almost halved.
I wonder why, and the first suspect is the protocol version. Can I somehow find out with which protocol version a file was pickled?
Protocol version 1 is an old binary format which is also compatible with earlier versions of Python. Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
We can use the pandas library to read a pickle file in Python. The pandas module has a read_pickle() method that can be used to read a pickle file.
Pickle can be used to serialize Python object structures, which refers to the process of converting an object in the memory to a byte stream that can be stored as a binary file on disk. When we load it back to a Python program, this binary file can be de-serialized back to a Python object.
To retrieve pickled data, the steps are quite simple. You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode.
There may be a more elegant way but to get down to the metal you can use pickletools:
import pickle
import pickletools
s = pickle.dumps('Test')
proto_op = next(pickletools.genops(s))
assert proto_op[0].name == 'PROTO'
proto_ver = proto_op[1]
To figure out the version required to decode this, you'll need to maximum protocol version of each opcode:
proto_ver = max(op[0].proto for op in pickletools.genops(s))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With