Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an AVRO file using Go?

Tags:

go

avro

I'm trying to create an AVRO file using Go. So far I tried a couple of libraries and I have some code.

The problem is that I can work with the data but don't know how to serialize it to store it. Here's the code I got from github.com/hamba/avro with some small modifications.

import (
  "fmt"
  "github.com/hamba/avro"
  "log"
)

type SimpleRecord struct {
        A int64  `avro:"a"`
        B string `avro:"b"`
}

func main() {
    schema, err := avro.Parse(`{
        "type": "record",
        "name": "simple",
        "namespace": "hamba",
        "fields" : [
            {"name": "a", "type": "long"},
            {"name": "b", "type": "string"}
        ]
    }`)
    if err != nil {
        log.Fatal(err)
    }

    in := SimpleRecord{A: 27, B: "foo"}

    data, err := avro.Marshal(schema, in)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(data)
}

This block of code prints:

[54 6 102 111 111]

This line corresponds to the avro encoding of the data. And it seems like this is all I need to store, but I don't know how to create the file itself.

I tried:

mode := int(0644)
    permissions := os.FileMode(mode)
    err = ioutil.WriteFile("file.avro", data, permissions)
    if err != nil {
        log.Fatal(err)
    }

And it generates a file. However, when I try to read it as an AVRO file using Python fastavro library, I get the error ValueError: cannot read header - is it an avro file?.

But according to the docs (https://godoc.org/github.com/hamba/avro#example-Marshal): "Marshal returns the Avro encoding of v." Marshal(schema Schema, v interface{}) ([]byte, error), so data should be of type []byte.

like image 518
6659081 Avatar asked Jan 19 '26 03:01

6659081


1 Answers

Avro defines the data encoding format only which can be packaged as messages or files. So, for file storage should use Avro OCF - Avro Object Container Files. Here is a working hamba avro ocf encoder example.

In my code I've encoded multiple rows to upload it to BigQuery (error checks, init, and close are omitted for clarity):

f, err := os.Open("/your/avro/file.avro")
enc, err := ocf.NewEncoder(schema, w, ocf.WithCodec(ocf.Snappy))
for _, item := range items {
    enc.Encode(item)
}
like image 141
sergeyfo Avatar answered Jan 22 '26 09:01

sergeyfo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!