I'm using PyYAML. Is there a way to define a YAML anchor in a way it won't be a part of the data structure loaded by yaml.load (I can remove "wifi_parm" from the dictionary but looking for a smarter way)?
example.yaml:
wifi_parm: &wifi_params
  ssid: 1
  key: 2
test1:
  name: connectivity
  <<: *wifi_params
test2:
  name: connectivity_5ghz
  <<: *wifi_params
load_example.py:
import yaml
import pprint
with open('aaa.yaml', 'r') as f:
    result = yaml.load(f)
pprint.pprint(result)
prints:
{'test1': {'key': 2, 'name': 'connectivity', 'ssid': 1},
 'test2': {'key': 2, 'name': 'connectivity_5ghz', 'ssid': 1},
 'wifi_parm': {'key': 2, 'ssid': 1}}
I need:
{'test1': {'key': 2, 'name': 'connectivity', 'ssid': 1},
 'test2': {'key': 2, 'name': 'connectivity_5ghz', 'ssid': 1}}
The anchor information in PyYAML is discarded before you get the result from yaml.load(). This is according to the YAML 1.1 specification that PyYAML follows (... anchor names are a serialization detail and are discarded once composing is completed). This has not changed in the YAML 1.2 specification (from 2009). You cannot do this in PyYAML by walking over your result (recursively) and testing what values might be anchors, without extensively modifying the parser.
In my ruamel.yaml (which is YAML 1.2) in round-trip-mode, I preserve the anchors and aliases for anchors that are actually used to alias mappings or sequences (anchors aliases are currently not preserved for scalars, nor are "unused" anchors):
import ruamel.yaml
yaml = ruamel.yaml.YAML()
with open('aaa.yaml') as f:
    result = yaml.load(f)
yaml.dump(result, sys.stdout)
gives:
wifi_parm: &wifi_params
  ssid: 1
  key: 2
test1:
  <<: *wifi_params
  name: connectivity
test2:
  <<: *wifi_params
  name: connectivity_5ghz
and you can actually walk the mapping (or recursively the tree) and find the anchor node and delete it, without knowing the keys name.
import ruamel.yaml
from ruamel.yaml.comments import merge_attrib
yaml = ruamel.yaml.YAML()
with open('aaa.yaml') as f:
    result = yaml.load(f)
keys_to_delete = []
for k in result:
    v = result[k]
    if v.yaml_anchor():
        keys_to_delete.append(k)
    for merge_data in v.merge:  # update the dict with the merge data 
        v.update(merge_data[1])
        delattr(v, merge_attrib)
for k in keys_to_delete:
    del result[k]
yaml.dump(result, sys.stdout)
gives:
test1:
  name: connectivity
  ssid: 1
  key: 2
test2:
  name: connectivity_5ghz
  ssid: 1
  key: 2
doing this generically and recursively (i.e. for anchors and aliases that are anywhere in the tree) is possible as well. The update would be as easy as above, but you would need to keep track of how to delete a key, and this doesn't have to be a mapping value, it could be a sequence item or a scalar.
I wanted to do this today too and instead of switching to ruamel.yaml like @Anthon suggests, I found the pyyaml-keep-anchors repository instead, which allowed me to continue using pyyaml. Here's the example from that repo, which worked out of the box for me.
import yaml
from yaml_keep_anchors.yaml_anchor_parser import AliasResolverYamlLoader
with open('example/example.yaml', 'r') as fh:
    data = yaml.load(fh, Loader=AliasResolverYamlLoader)
assert data['key_three'].anchor_name == 'anchor'
assert data['key_two']['sub_key'].anchor_name == 'anchor_val'
Updated example to show author of ruamel.yaml that scalars can indeed be checked to see if they're aliases.
Yaml file:
  wifi_parm: &wifi_params
    ssid: 1
    key: &key some_key_here
  test1:
    name: connectivity
    key: *key
  test2:
    name: connectivity_5ghz
    key: *key
Python code:
import yaml
from yaml_keep_anchors.yaml_anchor_parser import AliasResolverYamlLoader
with open('test.yaml', 'r') as f:
    result = yaml.load(f, Loader = AliasResolverYamlLoader)
print(result["test1"]["key"].__dict__)
This prints
{'_wrapped': 'some_key_here', '_anchor': 'key'}
because the referenced key is an alias.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With