How to parse xml from requests?

Question

Here's my code, which you can run without any API key:

import requests

r = requests.get('http://api.worldbank.org/v2/country/GBR/indicator/NY.GDP.MKTP.KD.ZG')

If I print r.text, I get a string that starts with

'\ufeff<?xml version="1.0" encoding="utf-8"?>
<wb:data page="1" pages="2" per_page="50" total="60" sourceid="2" lastupdated="2019-12-20" xmlns:wb="http://www.worldbank.org">
  <wb:data>
    <wb:indicator id="NY.GDP.MKTP.KD.ZG">GDP growth (annual %)</wb:indicator>
    <wb:country id="GB">United Kingdom</wb:country>
    <wb:countryiso3code>GBR</wb:countryiso3code>
    <wb:date>2019</wb:date>
`

A discouraged workaround is using regex:

import regex

import pandas as pd
import re

pd.DataFrame(
    re.findall(
        r"<wb:date>(\d{4})</wb:date>
    <wb:value>((?:\d\.)?\d{14})", r.text
    ),
    columns=["date", "value"],
)

What is a "proper" way of parsing this xml output? My final objective is to have a DataFrame with date and value columns, such as

    date    value
0   2018    1.38567356958762
1   2017    1.89207703836381
2   2016    1.91815510596298
3   2015    2.35552430595799
...

Omri · Accepted Answer

How about the following:

Decode the response:

decoded_response = response.content.decode('utf-8')

Convert to json:

response_json = json.loads(json.dumps(xmltodict.parse(decoded)))

Read into DataFrame:

pd.read_json(response_json)

Then you just need to play with the orient and such (docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html)

How to parse xml from requests?

Tags:

python

pandas

xml

python-requests

ignoring_gravity

1 Answers

Omri

Recent Activity

Donate For Us

How to parse xml from requests?

Tags:

python

pandas

xml

python-requests

ignoring_gravity

1 Answers

Omri

Related questions

Recent Activity

Donate For Us