Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the safest way to reassemble pickled Python objects in a stream?

I'm currently sending pickled Python (3.8) objects via sockets between two running programs. I have a buffer of bytes I want to reconstruct into their corresponding objects on the receiving end.

To my understanding, the socket.recv method is not guaranteed to catch all bytes sent, and it's up to the caller to call socket.recv again to pick up the rest of the data. So, at any given time, my buffer could contain partial packets.

Additionally, due to my use of threading, I could receive multiple messages before I check the buffer.

Here's my question:

Given that I'm receiving a byte stream of arbitrary length, which may contain fewer or more than one pickle object, what is the best way to reassemble them? Is there a character I can use as a terminator that is guaranteed to not conflict with pickle?

like image 287
attribute_error Avatar asked Oct 19 '25 12:10

attribute_error


1 Answers

Is there a character I can use as a terminator that is guaranteed to not conflict with pickle?

Unfortunately there isn't. Pickle packs data in binary form, therefore any sequence of bytes can appear inside a pickled object.

what is the best way to reassemble them?

The most common (and probably also the easiest) thing to do when dealing with this kind of problem is to send a fixed-size header that indicates the size of the data that is going to be received.

You can use struct.pack() to create an 8-byte header containing the binary representation (as an 8-byte network-endian unsigned integer) of the size of the pickled object, and send it before the actual data. On the receiving end, you'll first receive the 8-byte header, then you'll decode it to know the size of the data that was sent, and finally receive exactly that number of bytes.

Here's a (simplified) example:

  • Sender:

    class Example:
        pass
    
    data = pickle.dumps(Example())
    size = len(data)
    header = struct.pack("!Q", size)
    
    # open socket...
    
    sock.sendall(header)
    sock.sendall(data)
    
  • Receiver:

    class Example:
        pass
    
    def receive_exactly(sock, n):
        data = b''
    
        while n > 0:
            chunk = sock.recv(n)
            n -= len(chunk)
            data += chunk
    
        return data
    
    # open socket...
    
    header = receive_exactly(sock, 8)
    size = struct.unpack("!Q", header)[0]
    data = receive_exactly(sock, size)
    e = pickle.loads(data)
    

Note that the two above snippets only serve as simple examples, you should do the proper error checking and handling when using sendall() and recv().

like image 82
Marco Bonelli Avatar answered Oct 22 '25 00:10

Marco Bonelli



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!