Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python-3.x SIMPLE XPath Library

I am trying to parse quite simple XML using Python.

Before Python 3 I used "webscraping" library with XPath functionality. Works very simple:

xpath.search(xml (xml string), "XPath Query (//search)"

- returns found elements based on provided XPath query.

Now I have decided to switch to Python 3, and library mentioned above doesn't work properly with it (even after 2to3.py) - so I decided to use native xml.etree.ElementTree library.

Probably I don't understand something, but this is proper nightmare. It doesn't work in the way where you simple provide XML and XPath query into the function and it returns results. Instead, you need to use 10+ lines of code, messing with element's children etc. and it still doesn't work...

import xml.etree.ElementTree as ET
doc = ET.fromstring(xml)
result = doc.findall("//XPath Query")

returns SyntaxError: cannot use absolute path on element Adding . to //XPath Query doesn't help a lot either.

Is there some kind of reason why ElementTree and lxml libraries are so complicated and don't allow to SIMPLE use XPATH instead of messing around with elements, using for loop every time etc?

Could anyone recommend simple library for python 3 which will just use XPath Query and return result?

like image 580
Skylight Avatar asked Feb 28 '26 00:02

Skylight


2 Answers

Using the example xml from http://docs.python.org/2/library/xml.etree.elementtree.html, searching seems to work fine:

>>> import xml.etree.ElementTree as ET
>>> xml = """..."""
>>> doc = ET.fromstring(xml)
>>> doc.findall(".//rank")
[<Element 'rank' at 0x10199ebd0>, <Element 'rank' at 0x10199e210>, <Element 'rank' at 0x10199e4d0>]

Or if you want to search from root explicitly:

>>> ET.ElementTree(doc).findall('//rank')
like image 57
Fredrik Håård Avatar answered Mar 01 '26 17:03

Fredrik Håård


Found the problem now.

My XML response contains the following:

<?xml version="1.0" encoding="utf-8"?>
<GetOrdersResponse xmlns="urn:ebay:apis:eBLBaseComponents">
  <!-- Call-specific Output Fields -->
  <HasMoreOrders> boolean </HasMoreOrders>
  <OrderArray> OrderArrayType
    <Order> OrderType
      <AdjustmentAmount currencyID="CurrencyCodeType"> AmountType (double) </AdjustmentAmount>
      <AmountPaid currencyID="CurrencyCodeType"> AmountType (double) </AmountPaid>
      <AmountSaved currencyID="CurrencyCodeType"> AmountType (double) </AmountSaved>
      <BuyerCheckoutMessage> string </BuyerCheckoutMessage>
      <BuyerUserID> UserIDType (string) </BuyerUserID>
      <CheckoutStatus> CheckoutStatusType
      ...

After parsing that XML:

root = ET.fromstring(xml)
result = tree.findall("*")

It returns EVERY single element with prefix {urn:ebay:apis:eBLBaseComponents}

For example, if I need to search for <BuyerCheckoutMessage>

result = tree.findall(".//BuyerCheckoutMessage") it will return nothing, because that element looks like {urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage.

Therefore, to search for elements, I need to include {urn:ebay:apis:eBLBaseComponents} before every XPath query in order to retrieve my element.

So the solution is to use :

result = tree.findall(".//{urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage") result[0].text will return the elements value.

Why it just doesn't work the way of ET.search(xml, "XPath-query") is the biggest secret for me. So much time wasted.

like image 36
Skylight Avatar answered Mar 01 '26 18:03

Skylight



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!