I am going to store some big objects into database (BLOB). And protobuf is, as I see it, one of the best candidates to serialize/deserialize BLOB. Despite it has binary format, it is still easy to read and to change its content (strings, integers, etc). So I need some kind of data validation, whenever its original BLOB or modified (by hacker? by too smart user?).
One possibility would be to have a dedicated field in the table, call it crc, calculate checksum of BLOB an put it there. But it would be much better (in many scenarios), when crc is a part of BLOB itself.
I can add extra bytes to the end of protobuf stream, but I will have to delete them (or deserializer will throw exception "invalid field blablabla").
I can put protobuf stream into a wrapper, but it is again overhead to unwrap/wrap.
Is there an easy and cheap way to add something to the end of protobuf stream to avoid needs of additional operations during deserialization? In XML, I could add comment. I don't think there is a comment in protobuf, but how to put CRC which will be 1 or 2 bytes to example?
No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header.
Protocol buffers messages always use little-endian encoding. Implementations running on big-endian architectures should be doing the conversions automatically. If you are receiving data in wrong order, I would suggest using protoc --decode_raw to see whether the error occurs on the transmission or reception side.
Protobuf uses utf-8, but that is an implementation detail that you should never see. If your concern is that it may take more bytes in utf-8 than utf-16 (for the codepoints in question), the you can always use a "bytes" type and handle the text encoding yourself.
Protobuf streams are appendable. If you know a field number that doesn't exist in the data, you can simply append data against that field. If you are intending to add 1 or 2 bytes of CRC data, then a "varint" is probably your best bet (note that "varint" is a 7 bit encoding format with the 8th bit a continuation marker, so you probably want to use 7, 14 or 21 bits or actual CRC data), then you can just append:
However! The wrinkle in this is that the decoder will still often interpret and store this data, meaning that if you serialize it, it will include this data in the output.
The other approach, which avoids this, would be to encapsulate the protobuf data in some framing mechanism of your own devising. For example, you could choose to do:
I'd probably go with the second option. Note that you could choose "varint" encoding rather than fixed length encoding for the length prefix if you want. Probably not worth it for the CRC, though, since that will be fixed length.
Crc should be saved before. This makes deserialization from stream trivial by using Seek (to skip header).
Here is simplest implementation:
// serialize
using (var file = File.Create("test.bin"))
using (var mem = new MemoryStream())
{
    Serializer.Serialize(mem, obj); // serialize obj into memory first
    // ... calculate crc
    file.Write(new byte[] { crc }, 0, 1);
    mem.WriteTo(file);
}
// deserialize
using (var file = File.OpenRead("test.bin"))
{
    var crc = file.ReadByte();
    // ... calculate and check crc
    file.Seek(1, SeekOrigin.Begin);
    Serializer.Deserialize<ObjType>(file);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With