Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Considerations for binary seralizations (Protobuf, CBOR, MessagePack, etc.) for a long term archive data format

In discussions for a next generation scientific data format a need for some kind of JSON-like data structures (logical grouping of fieldshas been identified. Additionally, it would be preferable to leverage an existing encoding instead of using a custom binary structure. For serialization formats there are many options. Any guidance or insight from those that have experience with these kinds of encodings is appreciated.

Requirements: In our format, data need to be packed in records, normally no bigger than 4096-bytes. Each record must be independently usable. The data must be readable for decades to come. Data archiving and exchange is done by storing and transmitting a sequence of records. Data corruption must only effect the corrupted records, leaving all others in the file/stream/object readable.

Priorities (roughly in order) are:

  • stability, long term archive usage
  • performance, mostly read
  • ability to store opaque blobs
  • size
  • simplicity
  • broad software (aka library) support
  • stream-ability, transmitted and readable as a record is generated (if possible)

We have started to look at Protobuf (Protocol Buffers RFC), CBOR (RFC) and a bit at MessagePack.

Any information from those with experience that would help us determine the best fit or, more importantly, avoid pitfalls and dead-ends, would be greatly appreciated.

Thanks in advance!

like image 378
terse Avatar asked Oct 23 '25 21:10

terse


1 Answers

Late answer but: You may want to decide if you want a schema-based or self-describing format. Amazon Ion overview talks about some of the pros and cons of these design decisions, plus this other ION ( completely different from Amazon Ion ).

Neither of those fully meet your criteria, But these articles should list a few criteria you might want to consider. Obviously actually being a standard and being adopted are far higher guarantees of longevity than any technical design criteria

like image 123
kert Avatar answered Oct 26 '25 13:10

kert