Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot use some unicode character as a struct tag

I'm trying to build a program that extract data from a JSON and put it into a custom struct. The JSON contains keys like "foo\u00a0", so I have to use tags to get these values.

I have this code:

package main

import (
    "encoding/json"
    "fmt"
)

type MyStruct struct {
    X string `json:"foobar\u0062"`
    Y string `json:"foobaz\u00a0"`
}

func main() {
    data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)

    var ms MyStruct
    err := json.Unmarshal(data, &ms)
    if err != nil {
        panic(err)
    }

    fmt.Printf("First: %s\n", ms.X)
    fmt.Printf("Second: %s\n", ms.Y)

}

But it prints:

First: Bar
Second: 

It does not print the second value.

I tested it with different value from Latin 1 supplement and apparently,

  • it works with 00b5, 00f9
  • not with 00a1, 00a2, 00ab, 00af, 00b0

My questions:

  1. Why can the string foobar\u0062 be used as a tag but not foobaz\u00a0 ?
  2. If it's not possible, then how can I get the value of a keys in the format of foobar\u00a0 in a JSON ?
like image 295
glacier Avatar asked Oct 19 '25 02:10

glacier


1 Answers

The struct tag allows including such special characters like \u00a0, see this example to prove it:

type MyStruct struct {
    X string `json:"foobar\u0062"`
    Y string `json:"foobaz\u00a0"`
}

u := MyStruct{}
t := reflect.TypeOf(u)

for _, fieldName := range []string{"X", "Y"} {
    field, found := t.FieldByName(fieldName)
    if !found {
        continue
    }
    fmt.Printf("\nField: %s\n", fieldName)
    fmt.Printf("\tWhole tag value : %s\n", field.Tag)
    fmt.Printf("\tValue of 'json': %q\n", field.Tag.Get("json"))
}

This outputs (try it on the Go Playground):

Field: X
    Whole tag value : json:"foobar\u0062"
    Value of 'json': "foobarb"

Field: Y
    Whole tag value : json:"foobaz\u00a0"
    Value of 'json': "foobaz\u00a0"

But the encoding/json package is more strict and it does not allow such characters. The restriction is in encoding/json/encode.go:

func isValidTag(s string) bool {
    if s == "" {
        return false
    }
    for _, c := range s {
        switch {
        case strings.ContainsRune("!#$%&()*+-./:;<=>?@[]^_{|}~ ", c):
            // Backslash and quote chars are reserved, but
            // otherwise any punctuation chars are allowed
            // in a tag name.
        case !unicode.IsLetter(c) && !unicode.IsDigit(c):
            return false
        }
    }
    return true
}

So the json tag value of "foobar\u0062" is valid because '\u0062' is simply the 'b' character which is allowed.

And a json tag value of "foobaz\u00a0" is deemed invalid ('\u00a0' is not accepted by isValidTag()) and will not be unmarshaled. This restriction is historical and was added so that a json key can also be used for other purposes, such as protobuf keys.

If you want to unmarshal such input JSON using the encoding/json standard lib package, you can't use struct tags. Use a map for example:

data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)

var m map[string]any
err := json.Unmarshal(data, &m)
if err != nil {
    panic(err)
}
fmt.Println("X:", m["foobar\u0062"])
fmt.Println("Y:", m["foobaz\u00a0"])

This will output (try it on the Go Playground):

X: Bar
Y: Baz
like image 179
icza Avatar answered Oct 21 '25 16:10

icza