Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python how to access dataclass properties in list of dataclasses

Using python 3.10.4

Hi all, I'm putting together a script where I'm reading a yaml file with k8s cluster info, and I'd like to treat the loaded yaml as dataclasses so I can reference them with . properties.

Example yaml:

account: 12345
clusters:
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef

And here's my script for loading and accessing it:

import yaml
from dataclasses import dataclass

@dataclass
class ClusterInfo:
    _name: str
    _endpoint: str
    _certificate: str

@dataclass
class AWSInfo:
    _account: int
    _clusters: list[ClusterInfo]


clusters = yaml.safe_load(open('D:\git\clusters.yml', 'r'))
a = AWSInfo(
  _account=clusters['account'],
  _clusters=clusters['clusters']
)
print(a._account) #prints 12345
print(a._clusters) #prints the dict of both clusters
print(a._clusters[0]) #prints the dict of the first cluster

#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a._clusters[0]._endpoint)
for c in a._clusters:
    print(c._endpoint)

So my question is: What am I doing wrong on the last prints? How can I access the properties of each member in a dataclass array of dataclass objects?

like image 873
user3066571 Avatar asked Sep 12 '25 08:09

user3066571


2 Answers

The dataclasses module doesn't provide built-in support for this use case, i.e. loading YAML data to a nested class model.

In such a scenario, I would turn to a ser/de library such as dataclass-wizard, which provides OOTB support for (de)serializing YAML data, via the PyYAML library.

Disclaimer: I am the creator and maintener of this library.

Step 1: Generate a Dataclass Model

Note: I will likely need to make this step easier for generating a dataclass model for YAML data. Perhaps worth creating an issue to look into as time allows. Ideally, usage is from the CLI, however since we have YAML data, it is tricky, because the utility tool expects JSON.

So easiest to do this in Python itself, for now:

from json import dumps

# pip install PyYAML dataclass-wizard
from yaml import safe_load
from dataclass_wizard.wizard_cli import PyCodeGenerator

yaml_string = """
account: 12345
clusters:
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef
  - name: cluster_1
    endpoint: https://cluster_2
    certificate: abcdef
"""

py_code = PyCodeGenerator(experimental=True, file_contents=dumps(safe_load(yaml_string))).py_code
print(py_code)

Prints:

from __future__ import annotations

from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class Data(JSONWizard):
    """
    Data dataclass

    """
    account: int
    clusters: list[Cluster]


@dataclass
class Cluster:
    """
    Cluster dataclass

    """
    name: str
    endpoint: str
    certificate: str

Step 2: Use Generated Dataclass Model, alongside YAMLWizard

Contents of my_file.yml:

account: 12345
clusters:
  - name: cluster_1
    endpoint: https://cluster_5
    certificate: abcdef
  - name: cluster_2
    endpoint: https://cluster_7
    certificate: xyz

Python code:

from __future__ import annotations

from dataclasses import dataclass
from pprint import pprint

from dataclass_wizard import YAMLWizard


@dataclass
class Data(YAMLWizard):
    account: int
    clusters: list[Cluster]


@dataclass
class Cluster:
    name: str
    endpoint: str
    certificate: str


data = Data.from_yaml_file('./my_file.yml')
pprint(data)
for c in data.clusters:
    print(c.endpoint)

Result:

Data(account=12345,
     clusters=[Cluster(name='cluster_1',
                       endpoint='https://cluster_5',
                       certificate='abcdef'),
               Cluster(name='cluster_2',
                       endpoint='https://cluster_7',
                       certificate='xyz')])
https://cluster_5
https://cluster_7
like image 58
rv.kvetch Avatar answered Sep 14 '25 20:09

rv.kvetch


As Barmar points out in a comment, even though you have correctly typed the _clusters key in your AWSInfo dataclass...

@dataclass
class AWSInfo:
    _account: int
    _clusters: list[ClusterInfo]

...the dataclasses module isn't smart enough to automatically convert the members of the clusters list in in your input data into the appropriate data type. If you use a more comprehensive data model library like Pydantic, things will work like you expect:

import yaml
from pydantic import BaseModel

class ClusterInfo(BaseModel):
    name: str
    endpoint: str
    certificate: str

class AWSInfo(BaseModel):
    account: int
    clusters: list[ClusterInfo]


with open('clusters.yml', 'r') as fd:
    clusters = yaml.safe_load(fd)
a = AWSInfo(**clusters)

print(a.account) #prints 12345
print(a.clusters) #prints the dict of both clusters
print(a.clusters[0]) #prints the dict of the first cluster

#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a.clusters[0].endpoint)
for c in a.clusters:
    print(c.endpoint)

Running the above code (with your sample input) produces:

12345
[ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef'), ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef')]
name='cluster_1' endpoint='https://cluster_2' certificate='abcdef'
https://cluster_2
https://cluster_2
https://cluster_2
like image 36
larsks Avatar answered Sep 14 '25 20:09

larsks