Considerations for binary seralizations (Protobuf, CBOR, MessagePack, etc.) for a long term archive data format

Question

In discussions for a next generation scientific data format a need for some kind of JSON-like data structures (logical grouping of fieldshas been identified. Additionally, it would be preferable to leverage an existing encoding instead of using a custom binary structure. For serialization formats there are many options. Any guidance or insight from those that have experience with these kinds of encodings is appreciated.

Requirements: In our format, data need to be packed in records, normally no bigger than 4096-bytes. Each record must be independently usable. The data must be readable for decades to come. Data archiving and exchange is done by storing and transmitting a sequence of records. Data corruption must only effect the corrupted records, leaving all others in the file/stream/object readable.

Priorities (roughly in order) are:

stability, long term archive usage
performance, mostly read
ability to store opaque blobs
size
simplicity
broad software (aka library) support
stream-ability, transmitted and readable as a record is generated (if possible)

We have started to look at Protobuf (Protocol Buffers RFC), CBOR (RFC) and a bit at MessagePack.

Any information from those with experience that would help us determine the best fit or, more importantly, avoid pitfalls and dead-ends, would be greatly appreciated.

Thanks in advance!

kert · Accepted Answer

Late answer but: You may want to decide if you want a schema-based or self-describing format. Amazon Ion overview talks about some of the pros and cons of these design decisions, plus this other ION ( completely different from Amazon Ion ).

Neither of those fully meet your criteria, But these articles should list a few criteria you might want to consider. Obviously actually being a standard and being adopted are far higher guarantees of longevity than any technical design criteria

Considerations for binary seralizations (Protobuf, CBOR, MessagePack, etc.) for a long term archive data format

Tags:

json

format

protocol-buffers

msgpack

cbor

terse

1 Answers

kert

Recent Activity

Donate For Us

Considerations for binary seralizations (Protobuf, CBOR, MessagePack, etc.) for a long term archive data format

Tags:

json

format

protocol-buffers

msgpack

cbor

terse

1 Answers

kert

Related questions

Recent Activity

Donate For Us