How to set up namespaces for Azure tts (mstts)

Question

I need to create this header for Azure TTS:

 <speak version="1.0" 
  xmlns="https://www.w3.org/2001/10/synthesis" 
   xmlns:mstts="https://www.w3.org/2001/mstts" 
   xml:lang="en-US">

This is the code that works to create the xml:lang key:

xml_body = ElementTree.Element('speak', version='1.0')
xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')

I've tried to create the xmlns:mstts without success. This doesn't work:

xml_body.set('{https://www.w3.org/2001/10/synthesis}mstts', 'https://www.w3.org/2001/mstts' )

because this produces the following output:

<speak
 xmlns:ns0="https://www.w3.org/2001/10/synthesis"
 version="1.0"
 xml:lang="en-us"
 ns0:mstts="https://www.w3.org/2001/mstts" />

Note the xmlns:ns0 and ns0:mstts problems in the attributes on the <speak> element.

Any ideas?

Martijn Pieters · Accepted Answer

You need to give the speak element a namespace, as well as it's version attribute, that's what the xmlns="..." attribute normally configures. Use the {<namespaceuri>}<tagname> qualified name format for this, just like you do for the xml:lang attribute:

xml_body = ElementTree.Element('{https://www.w3.org/2001/10/synthesis}speak')
xml_body.set('{https://www.w3.org/2001/10/synthesis}version', '1.0')
xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')

You can also add the attributes as a dictionary, passed in as the second argument:

xml_body = ElementTree.Element(
    '{https://www.w3.org/2001/10/synthesis}speak', {
        '{https://www.w3.org/2001/10/synthesis}version': '1.0',
        '{http://www.w3.org/XML/1998/namespace}lang': 'en-us'
    })

You do not need to set the xmlns:mstts attribute, however! ElementTree will add this attribute automatically, as needed, based on what namespaces have been used in the XML tree you build.

You do want to register this namespace with ElementTree, using the register_namespace() function:

ElementTree.register_namespace('mstts', 'https://www.w3.org/2001/mstts')

This tells ElementTree that when you use the {https://www.w3.org/2001/mstts} namespace with tag names or attributes, that when serialising the XML tree to a file or string, that this namespace should be using mstts as the prefix. If you don't register the namespace, a namespace prefix will be generated for you (ns0:, ns1:, etc.). That'll result in perfectly valid XML, namespace prefixes are documentation-local and prefixes are just short-hand names for the full namespace URI. Any compliant XML parser will handle ns0 as the prefix for the https://www.w3.org/2001/mstts namespace URI exactly the same as using mstts.

For a default namespace (such as the https://www.w3.org/2001/10/synthesis namespace for the root <speak> element), use the default_namespace argument for the ElementTree.write() method.

I find it easier to QName() objects to handle namespaced attribute and tag names; together with variables for the specific namespaces that makes it less likely you make typos. Here is a more full-fledged example based on an example from the Azure SSML documentation

from xml.etree import ElementTree as ET
from functools import partial

ns = {
    "synthesis": "https://www.w3.org/2001/10/synthesis",
    "mstts": "https://www.w3.org/2001/mstts",
    "xml": "http://www.w3.org/XML/1998/namespace",
}
ET.register_namespace('mstts', ns['mstts'])

synthesis = partial(ET.QName, ns["synthesis"])
mstts = partial(ET.QName, ns["mstts"])
xml_ = partial(ET.QName, ns["xml"])

xml_body = ET.Element(synthesis('speak'), {
    synthesis('version'): '1.0',
    xml_('lang'): 'en-us',
})
voice = ET.SubElement(xml_body, synthesis('voice'), {
    synthesis('name'): 'en-US-JessaNeural'})
express_as = ET.SubElement(voice, mstts('express-as'), {
    mstts('type'): 'cheerful'})
express_as.text = "That'd be just amazing!"

root = ET.ElementTree(xml_body)
root.write("filename.xml", encoding="UTF-8", default_namespace=ns["synthesis"])

The above produces the following XML (manually pretty-printed for easy reading):

<speak
 xmlns="https://www.w3.org/2001/10/synthesis"
 xmlns:mstts="https://www.w3.org/2001/mstts"
 version="1.0"
 xml:lang="en-us">
    <voice name="en-US-JessaNeural">
        <mstts:express-as mstts:type="cheerful">
            That'd be just amazing!
        </mstts:express-as>
    </voice>
</speak>

You may instead want to look at the external lxml library as well, as it includes an lxml.builder.ElementMaker class that makes working with namespaces easier still.

lxml has much better namespace support in general, and attributes of an element that already has a namespace don't need to be explicitly qualified with a namespace themselves. You can mark a given namespace mapping as the default by using the prefix None in a dictionary setting up the namespaces:

from lxml import etree as ET
from lxml.builder import ElementMaker

ns = {
    None: "https://www.w3.org/2001/10/synthesis",
    "mstts": "https://www.w3.org/2001/mstts",
}

E = ElementMaker(namespace=ns[None], nsmap=ns)
TTS = ElementMaker(namespace=ns['mstts'])

xml_body = E.speak(
    {"version": "1.0",
     "{http://www.w3.org/XML/1998/namespace}lang": "en-US"},
    E.voice(
        {"name": "en-US-JessaNeural"},
        TTS.express_as(
            "That'd be just amazing!",
            type="cheerful",
        )
    )
)

In the above, using either E.tagname(...) or E('tagname', ...) will create an element with the https://www.w3.org/2001/10/synthesis namespace URI, while the MSTTS object creates tags with the https://www.w3.org/2001/mstts namespace URI. Because we gave E a namespace map with None mapping to the https://www.w3.org/2001/10/synthesis namespace URI, that URI will be used as the default namespace and tagnames and attributes in that namespace will not be prefixed.

You can pass in attributes either as keyword arguments (E.tagname(..., attributename="value")) or via a dictionary passed in as positional argument. Any contained elements can simply be added as positional arguments, including text. You can also add the normal Element methods (e.g. Element.append() to add a child element, or Element.text = ... to set dictate what the text content of a tag is).

Demo using lxml:

>>> from lxml import etree as ET
>>> from lxml.builder import ElementMaker
>>> ns = {
...     None: "https://www.w3.org/2001/10/synthesis",
...     "mstts": "https://www.w3.org/2001/mstts",
... }
>>> E = ElementMaker(namespace=ns[None], nsmap=ns)
>>> TTS = ElementMaker(namespace=ns['mstts'])
>>> xml_body = E.speak(
...     {"version": "1.0",
...      "{http://www.w3.org/XML/1998/namespace}lang": "en-US"},
...     E.voice(
...         {"name": "en-US-JessaNeural"},
...         TTS.express_as(
...             "That'd be just amazing!",
...             type="cheerful",
...         )
...     )
... )
>>> print(ET.tostring(xml_body, encoding="unicode", pretty_print=True))
<speak xmlns="https://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" version="1.0" xml:lang="en-US">
  <voice name="en-US-JessaNeural">
    <mstts:express_as type="cheerful">That'd be just amazing!</mstts:express_as>
  </voice>
</speak>

How to set up namespaces for Azure tts (mstts)

Tags:

python

xml

xml-namespaces

elementtree

rodbs

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

How to set up namespaces for Azure tts (mstts)

Tags:

python

xml

xml-namespaces

elementtree

rodbs

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us