Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Associating units with Pandas DataFrame

Tags:

python

pandas

csv

I'm using a web service that returns a CSV response in which the 1st row contains the column names, and the 2nd row contains the column units, for example:

longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114

I can read this into a Pandas DataFrame:

import pandas as pd
from StringIO import StringIO

x = '''
longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114
'''

# Create a Pandas DataFrame
obs=pd.read_csv(StringIO(x.strip()), sep=",\s*")

print(obs)

which produces

      longitude       latitude
0  degrees_east  degrees_north
1      -142.842          -1.82
2       -25.389          39.87
3       -37.704         27.114

But what would be the best approach to associate the units with the DataFrame columns for later use, for example labeling plots?

like image 612
Rich Signell Avatar asked Sep 06 '25 03:09

Rich Signell


1 Answers

Allowing pandas to read the second line as data is screwing up the dtype for the columns. Instead of a float dtype, the presence of strings make the dtype of the columns object, and the underlying objects, even the numbers, are strings. This screws up all numerical operations:

In [8]: obs['latitude']+obs['longitude']
Out[8]: 
0    degrees_northdegrees_east
1                -1.82-142.842
2                 39.87-25.389
3                27.114-37.704

In [9]: obs['latitude'][1]
Out[9]: '-1.82'

So it is imperative that pd.read_csv skip the second line.

The following is pretty ugly, but given the format of the input, I don't see a better way.

import pandas as pd
from StringIO import StringIO

x = '''
longitude,latitude
degrees_east,degrees_north
-142.842,-1.82
-25.389,39.87
-37.704,27.114
'''

content = StringIO(x.strip())

def read_csv(content):
    columns = next(content).strip().split(',')
    units = next(content).strip().split(',')
    obs = pd.read_table(content, sep=",\s*", header=None)
    obs.columns = ['{c} ({u})'.format(c=col, u=unit)
                   for col, unit in zip(columns, units)]
    return obs

obs = read_csv(content)
print(obs)
#    longitude (degrees_east)  latitude (degrees_north)
# 0                  -142.842                    -1.820
# 1                   -25.389                    39.870
# 2                   -37.704                    27.114
print(obs.dtypes)
# longitude (degrees_east)    float64
# latitude (degrees_north)    float64
like image 130
unutbu Avatar answered Sep 07 '25 22:09

unutbu