Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python data persistence

Whenever a Python object needs to be stored or sent over a network, it is first serialized. I guess the reason is that the storage and network transfer are both based on bits. I have a stupid question, which is more like a computer science foundation question than a python question. What kind of format do python objects take when they are in cache? Shouldn't they represent themselves as bits? If that's the case, why not just use those bits to store or send the object, and why bother with serialization?

like image 920
David Zheng Avatar asked Feb 16 '26 12:02

David Zheng


1 Answers

Bit Representation

The same object can have different representations in Bits on different machines:

  • Think endianness (byte-order)
  • and architecture (32 bits, 64 bites)

So an object representation in Bits on the sender machine could mean nothing, (or worse could mean something else) when received on the receiver.

Take an simple integer, 1025, as an illustration of the problem:

  • On Big Endian machine the Bits representation is:
    • binary: 00000000 00000000 00000100 00000001
    • hexadecimal: 0x00000401
  • while on a Little Endian machine:
    • binary: 00000001 00000100 00000000 00000000
    • hexadecimal 0x01040000

That's why to understand each other, 2 machines have to agree on a convention, a protocol. For the IP protocol, the convention is to use the network byte order (big-endian) for example.

More on endianness in this question

Serialization (and Deserialization)

We can't directly send an object underlying bit representation on the network, for the reasons described before, but not only.

An object can make reference to another object, internally, through a pointer (the in-memory address of this second object). This address is, again, platform-dependent.

Python solves this using a serialization algorithm called pickling that transforms an object hierarchy into a byte-stream. This byte-stream, when sent over a network, is still platform-dependent and that's why a protocol is needed for both ends to understand each other.

Pickle module documentation

like image 69
arainone Avatar answered Feb 18 '26 00:02

arainone



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!