Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Binary encoded AVRO v/s JSON encoded AVRO

Based on the AVRO documentation, for binary encoded AVRO, i understand that there are 2 important aspects. One is the AVRO schema i.e: .avsc (which is represented in JSON) that describes the fields of the data and then there is the actual data that is binary encoded.

There is very little documentation on JSON encoded AVRO and so i am trying to understand if it follows the same semantics where we will have AVRO schema file in JSON format (i.e: .avsc file) followed by the payload which would be data encoded in JSON ? Or whether its just the payload alone which is JSON encoded completely where the value against each key is binary encoded ?

Trying to experiment with Python so any leads/sample code would help.

Thanks!

like image 882
Rakshith Venkatesh Avatar asked Mar 09 '26 19:03

Rakshith Venkatesh


1 Answers

The binary and JSON encodings have to do only with the payload itself. For example, if you had a schema like this:

{
  "type": "record",
  "name": "test",
  "fields" : [
    {"name": "a", "type": "long"},
    {"name": "b", "type": "string"}
  ]
}

And you had a record whose a field has a value of 27 and whose b field has a value of foo then the binary encoding would be the following hex byte sequence:

36 06 66 6f 6f

Whereas the JSON encoding would simply be:

{"a": 27, "b": "foo"}

The binary format is much more compact, but of course the JSON format is much more readable.

When you talk about the semantics of having a schema and a payload encoded as one output, you are really talking about avro container files (https://avro.apache.org/docs/current/spec.html#Object+Container+Files) and those only use the binary encoding. There is no specification for a container file that uses the JSON encoding. An avro file can be parsed without any prior knowledge because the schema is baked into the file, but JSON encoded avro will always need the schema as an input since it is not baked into the result like an avro file is.

If you are using Python, the standard avro library doesn't support the JSON encoding as far as I know, but fastavro does. The docs for reading and writing are below:

https://fastavro.readthedocs.io/en/latest/json_reader.html https://fastavro.readthedocs.io/en/latest/json_writer.html

like image 79
Scott Avatar answered Mar 12 '26 07:03

Scott