Using python 3.10.4
Hi all, I'm putting together a script where I'm reading a yaml file with k8s cluster info, and I'd like to treat the loaded yaml as dataclasses so I can reference them with .
properties.
Example yaml:
account: 12345
clusters:
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
And here's my script for loading and accessing it:
import yaml
from dataclasses import dataclass
@dataclass
class ClusterInfo:
_name: str
_endpoint: str
_certificate: str
@dataclass
class AWSInfo:
_account: int
_clusters: list[ClusterInfo]
clusters = yaml.safe_load(open('D:\git\clusters.yml', 'r'))
a = AWSInfo(
_account=clusters['account'],
_clusters=clusters['clusters']
)
print(a._account) #prints 12345
print(a._clusters) #prints the dict of both clusters
print(a._clusters[0]) #prints the dict of the first cluster
#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a._clusters[0]._endpoint)
for c in a._clusters:
print(c._endpoint)
So my question is: What am I doing wrong on the last prints? How can I access the properties of each member in a dataclass array of dataclass objects?
The dataclasses
module doesn't provide built-in support for this use case, i.e. loading YAML data to a nested class model.
In such a scenario, I would turn to a ser/de library such as dataclass-wizard
, which provides OOTB support for (de)serializing YAML data, via the PyYAML
library.
Disclaimer: I am the creator and maintener of this library.
Note: I will likely need to make this step easier for generating a dataclass model for YAML data. Perhaps worth creating an issue to look into as time allows. Ideally, usage is from the CLI, however since we have YAML data, it is tricky, because the utility tool expects JSON.
So easiest to do this in Python itself, for now:
from json import dumps
# pip install PyYAML dataclass-wizard
from yaml import safe_load
from dataclass_wizard.wizard_cli import PyCodeGenerator
yaml_string = """
account: 12345
clusters:
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
"""
py_code = PyCodeGenerator(experimental=True, file_contents=dumps(safe_load(yaml_string))).py_code
print(py_code)
Prints:
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
@dataclass
class Data(JSONWizard):
"""
Data dataclass
"""
account: int
clusters: list[Cluster]
@dataclass
class Cluster:
"""
Cluster dataclass
"""
name: str
endpoint: str
certificate: str
YAMLWizard
Contents of my_file.yml
:
account: 12345
clusters:
- name: cluster_1
endpoint: https://cluster_5
certificate: abcdef
- name: cluster_2
endpoint: https://cluster_7
certificate: xyz
Python code:
from __future__ import annotations
from dataclasses import dataclass
from pprint import pprint
from dataclass_wizard import YAMLWizard
@dataclass
class Data(YAMLWizard):
account: int
clusters: list[Cluster]
@dataclass
class Cluster:
name: str
endpoint: str
certificate: str
data = Data.from_yaml_file('./my_file.yml')
pprint(data)
for c in data.clusters:
print(c.endpoint)
Result:
Data(account=12345,
clusters=[Cluster(name='cluster_1',
endpoint='https://cluster_5',
certificate='abcdef'),
Cluster(name='cluster_2',
endpoint='https://cluster_7',
certificate='xyz')])
https://cluster_5
https://cluster_7
As Barmar points out in a comment, even though you have correctly typed the _clusters
key in your AWSInfo
dataclass...
@dataclass
class AWSInfo:
_account: int
_clusters: list[ClusterInfo]
...the dataclasses
module isn't smart enough to automatically convert the members of the clusters
list in in your input data into the appropriate data type. If you use a more comprehensive data model library like Pydantic, things will work like you expect:
import yaml
from pydantic import BaseModel
class ClusterInfo(BaseModel):
name: str
endpoint: str
certificate: str
class AWSInfo(BaseModel):
account: int
clusters: list[ClusterInfo]
with open('clusters.yml', 'r') as fd:
clusters = yaml.safe_load(fd)
a = AWSInfo(**clusters)
print(a.account) #prints 12345
print(a.clusters) #prints the dict of both clusters
print(a.clusters[0]) #prints the dict of the first cluster
#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a.clusters[0].endpoint)
for c in a.clusters:
print(c.endpoint)
Running the above code (with your sample input) produces:
12345
[ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef'), ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef')]
name='cluster_1' endpoint='https://cluster_2' certificate='abcdef'
https://cluster_2
https://cluster_2
https://cluster_2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With