Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a JSON structure into a dataclass object tree?

Tags:

python

json

I have a JSON data structure. Every object has a field called "type".

json_data_str = """
{
    "type" : "Game",
    "levels" : [
        {
            "type": "Level",
            "map" : {
                "type" : "SquareRoom",
                "name" : "Level 1",
                "width" : 100,
                "height" : 100
            },
            "waves" : [
                {
                    "type" : "Wave",
                    "enemies" : [
                        {
                            "type" : "Wizard",
                            "name" : "Gandalf"
                        },
                        {
                            "type" : "Archer",
                            "name" : "Legolass"
                        }
                    ]
                }
            ]
        }
    ]
}
"""

And I want to convert this into an object tree composed of the following classes

from dataclasses import dataclass
from typing import List

@dataclass
class GameObject:
    ...

@dataclass
class Character(GameObject):
    name: str

@dataclass
class Wave(GameObject):
    enemies: List[Character]

@dataclass
class Wizard(Character):
    ...

@dataclass
class Archer(Character):
    ...

@dataclass
class Map(GameObject):
    name: str

@dataclass
class SquareRoom(Map):
    width: int
    height: int

@dataclass
class Level(GameObject):
    waves: List[Wave]
    map: Map
    
@dataclass
class Game(GameObject):
    levels: List[Level]

I can unpack a simple json object into a dataclass quite easily using the ** operator: e.g

json_data_str = """
{
   "type" : "Person"
   "name" : "Bob"
   "age" : 29
}
"""

class GameObject(ABC):
    ...

@dataclass
class Person(GameObject):
    name: str
    age: int

game_object_registry: Dict[str, Type[GameObject]] = {}
game_object_registry['Person'] = Person

json_obj = json.loads(json_data_str)
obj_type = json_obj['type']
del json_obj['type']
ObjType = game_object_registry[obj_type]
ObjType(**json_obj)

But how can I extend this to work with nested objects?

I want it to create this data class instance:

game = Game(levels=[Level(map=SquareRoom(name="Level 1", width=100, height=100), waves=[Wave([Wizard(name="Gandalf"), Archer(name="Legolass")])])])

Here is my best attempt. It doesn't really make sense, but it might be a starting point. I realise this logic doesn't make sense, but I cannot come up with a function that does make sense.

def json_to_game_object(json_obj: Any, game_object_registry: Dict[str, Type[GameObject]]) -> Any:

    if type(json_obj) is dict:
        obj_type: str = json_obj['type']
        del json_obj['type']
        ObjType = game_object_registry[obj_type]
        for key, value in json_obj.items():
            logging.debug(f'Parsing feild "{key}:{value}"')
            json_to_game_object(value, game_object_registry)
            if type(value) is dict:
                logging.debug(f'Creating object of type {ObjType} with args {value}')
                return ObjType(**value)
    elif type(json_obj) is list:
        logging.debug(f'Parsing JSON List')
        for elem in json_obj:
            logging.debug(f'Parsing list element "{json_obj.index(elem)}"')
            json_to_game_object(elem, game_object_registry)
    else:
        logging.debug(f'Parsing value')
like image 415
Blue7 Avatar asked Sep 13 '25 22:09

Blue7


2 Answers

Assuming you have control over the JSON / dict structure. You can use a framework like dacite.

It will let you map the data into your dataclasses.

Example (taken from dacite github) below:

@dataclass
class A:
    x: str
    y: int


@dataclass
class B:
    a: A


data = {
    'a': {
        'x': 'test',
        'y': 1,
    }
}

result = from_dict(data_class=B, data=data)

assert result == B(a=A(x='test', y=1))
like image 65
balderman Avatar answered Sep 16 '25 11:09

balderman


As an alternative, you could also use the dataclass-wizard library for this.

This should support dataclasses in Union types as of a recent version, and note that as of v0.19.0, you can pass tag_key in the Meta config for the main dataclass, to configure the tag field name in the JSON object that maps to the dataclass in each Union type - which in your case defaults to the type field.

I've also removed this type field entirely in cases where it was not really needed -- note that you'd only need such a tag field when you have a field that maps to one or more dataclass types, via a Union declaration. The one main benefit of using a custom tag for each class, is that if you later decide to rename the class for instance, any existing JSON data can still be de-serialized into the nested dataclass model as expected.

The below example should work for Python 3.7+ with the included __future__ import. This allows you to use PEP 585 and PEP 604- style annotations, for a more convenient shorthand syntax.

from __future__ import annotations

from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class GameObject:
    ...


@dataclass
class Game(GameObject, JSONWizard):

    class _(JSONWizard.Meta):
        # Set tag key in JSON object; defaults to '__tag__' if not specified.
        tag_key = 'type'

    levels: list[Level]


@dataclass
class Level(GameObject):
    waves: list[Wave]
    # TODO: define other map classes
    map: SquareRoom | Map


@dataclass
class Wave(GameObject):
    enemies: list[Wizard | Archer]


@dataclass
class Character(GameObject):
    name: str


@dataclass
class Wizard(Character, JSONWizard):

    class _(JSONWizard.Meta):
        tag = 'Wizard'

    ...


@dataclass
class Archer(Character, JSONWizard):

    class _(JSONWizard.Meta):
        tag = 'Archer'

    ...


@dataclass
class Map(GameObject):
    name: str


@dataclass
class SquareRoom(Map, JSONWizard):

    class _(JSONWizard.Meta):
        tag = 'SquareRoom'

    width: int
    height: int


def main():
    json_data_str = """
    {
        "levels": [
            {
                "map": {
                    "type": "SquareRoom",
                    "name": "Level 1",
                    "width": 100,
                    "height": 100
                },
                "waves": [
                    {
                        "enemies": [
                            {
                                "type": "Wizard",
                                "name": "Gandalf"
                            },
                            {
                                "type": "Archer",
                                "name": "Legolass"
                            }
                        ]
                    }
                ]
            }
        ]
    }
    """

    game = Game.from_json(json_data_str)
    print(repr(game))
    print('Prettified JSON result:\n', game)


if __name__ == '__main__':
    main()

Output:

Game(levels=[Level(waves=[Wave(enemies=[Wizard(name='Gandalf'), Archer(name='Legolass')])], map=SquareRoom(name='Level 1', width=100, height=100))])
Prettified JSON result:
 {
  "levels": [
    {
      "waves": [
        {
          "enemies": [
            {
              "name": "Gandalf",
              "type": "Wizard"
            },
            {
              "name": "Legolass",
              "type": "Archer"
            }
          ]
        }
      ],
      "map": {
        "name": "Level 1",
        "width": 100,
        "height": 100,
        "type": "SquareRoom"
      }
    }
  ]
}

If you want to save yourself some time and not have to manually define a tag for each dataclass in a Union type, you can also automate this process by enabling the auto_assign_tags flag on the main dataclass; this will default to assign the class name as the tag for each nested class. This also allows you to remove the JSONWizard usage from the nested dataclasses, so that you only have it on the main dataclass, as shown below.

from __future__ import annotations

from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class GameObject:
    ...


@dataclass
class Game(GameObject, JSONWizard):

    class _(JSONWizard.Meta):
        # Set tag key in JSON object; defaults to '__tag__' if not specified.
        tag_key = 'type'
        auto_assign_tags = True

    levels: list[Level]


@dataclass
class Level(GameObject):
    waves: list[Wave]
    # TODO: define other map classes
    map: SquareRoom | Map


@dataclass
class Wave(GameObject):
    enemies: list[Wizard | Archer]


@dataclass
class Character(GameObject):
    name: str


@dataclass
class Wizard(Character):
    ...


@dataclass
class Archer(Character):
    ...


@dataclass
class Map(GameObject):
    name: str


@dataclass
class SquareRoom(Map):
    width: int
    height: int

The only caveat with this approach, is if you later decide to rename a dataclass such as the Archer class, any existing JSON data can't be de-serialized without manually specifying a tag for the class.

The output should be the same as in the first example, where we explicitly specified a tag for each dataclass in a Union declaration.


If you need data validation or if you want to retain the type field in the JSON object, I'd also suggest pydantic as another solution. In addition, you can use pydantic drop-in dataclasses and retain the @dataclass usage for the rest of the model classes, as shown below.

from typing import List, Union

from pydantic import BaseModel
from pydantic.dataclasses import dataclass
from typing_extensions import Literal


@dataclass
class GameObject:
    ...


@dataclass
class Character(GameObject):
    name: str


@dataclass
class Wizard(Character):
    type: Literal['Wizard']


@dataclass
class Archer(Character):
    type: Literal['Archer']


@dataclass
class Wave(GameObject):
    enemies: List[Union[Wizard, Archer]]


@dataclass
class Map(GameObject):
    name: str


@dataclass
class SquareRoom(Map):
    type: Literal['SquareRoom']
    width: int
    height: int


@dataclass
class Level(GameObject):
    waves: List[Wave]
    # TODO: define other map classes
    map: Union[SquareRoom, Map]


class Game(BaseModel, GameObject):
    levels: List[Level]


def main():
    json_data_str = """
    {
        "levels": [
            {
                "map": {
                    "type": "SquareRoom",
                    "name": "Level 1",
                    "width": 100,
                    "height": 100
                },
                "waves": [
                    {
                        "enemies": [
                            {
                                "type": "Wizard",
                                "name": "Gandalf"
                            },
                            {
                                "type": "Archer",
                                "name": "Legolass"
                            }
                        ]
                    }
                ]
            }
        ]
    }
    """

    game = Game.parse_raw(json_data_str)
    print(repr(game))


if __name__ == '__main__':
    main()

The output in this case is slightly different - note that when you print the repr of the Game object, you also see the type fields printed out, since technically it is a dataclass field.

Game(levels=[Level(waves=[Wave(enemies=[Wizard(name='Gandalf', type='Wizard'), Archer(name='Legolass', type='Archer')])], map=SquareRoom(name='Level 1', type='SquareRoom', width=100, height=100))])
like image 33
rv.kvetch Avatar answered Sep 16 '25 12:09

rv.kvetch