Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python lxml get parent element when you know child text with xpath

I have the following xml file: test.xml

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.someaddress.com/someendpoint">
      <objTransaction>
        <DataFields>
          <TxnField>
            <FieldName>Pickup.Address.CountryCode</FieldName>
            <FieldValue>DE</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.Address.PostalCode</FieldName>
            <FieldValue>10827</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.DateTime</FieldName>
            <FieldValue>2016-05-28T03:26:05</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.LocationTypeCode</FieldName>
            <FieldValue>O</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.Address.City</FieldName>
            <FieldValue>Berlin</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
        </DataFields>
      </objTransaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

What I want to do is to get an element with tag TxnField that has a child FieldName with text Pickup.DateTime. It is important to get the parent element, so I need to get this:

<TxnField>
  <FieldName>Pickup.DateTime</FieldName>
  <FieldValue>2016-05-28T03:26:05</FieldValue>
  <FieldIndex>0</FieldIndex>
</TxnField>

What I have so far is the following:

from lxml import etree
xml_parser = etree.XMLParser(remove_blank_text=True)
xml_tree = etree.parse('test.xml', xml_parser)

p_time = xml_tree.xpath("//*[local-name()='TxnField']/*[text()='Pickup.DateTime']")
print(p_time[0].tag) # {http://http://www.someaddress.com/someendpoint}FieldName

But this gives me the actual element with text Pickup.DateTime and I am interested in getting its parent as shown above.

As a side note: it took me almost an hour even to get this far because I find the lxml documentation to be very cumbersome. If anyone has a link with a good tutorial please post it at least as a comment. Thanks!

like image 994
skamsie Avatar asked Sep 14 '25 11:09

skamsie


1 Answers

Here is a suggestion:

from lxml import etree

NSMAP = {"s": "http://www.someaddress.com/someendpoint"}

xml_parser = etree.XMLParser(remove_blank_text=True)
xml_tree = etree.parse('test.xml', xml_parser)

p_time = xml_tree.xpath("//s:FieldName[.='Pickup.DateTime']", namespaces=NSMAP)[0]
parent = p_time.getparent()
  • The s prefix is declared to be bound to the http://www.someaddress.com/someendpoint namespace. It is used in the XPath expression instead of local-name().
  • The call to xpath() returns a list with one item (the wanted FieldName element) and then the getparent() method is used to find its parent.

There is more than one way to do it!

Btw, I think this is a pretty good lxml tutorial: http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/index.html

like image 112
mzjn Avatar answered Sep 17 '25 01:09

mzjn