I'm working with census data (using the Census package ). When I select variables with the census API, they pass through in their raw format (e.g. B01001_007) and I'd like to replace the column name with the label (e.g. male 18 to 19 years).
I know this can be done through df.columns = ['male 18 to 19 years', 'male 20 years', 'male 21 years'] but this is tedious.
Is there a way to do some type of mapping that will auto-query into the header in my df below?
Sample data:
import pandas as pd
from pandas import DataFrame
variables_table = pd.DataFrame({'variable': ['B01001_007E','B01001_008E','B01001_009E'],
'label': ['male 18 to 19 years','male 20 years','male 21 years']
})
variables_table
label variable
male 18 to 19 years B01001_007E
male 20 years B01001_008E
male 21 years B01001_009E
Unclean output:
df = pd.DataFrame({'B01001_007E': ['100','200','300'],
'B01001_008E': ['300','200','100'],
'B01001_009E': ['500','100','200']})
df
B01001_007E B01001_008E B01001_009E
100 300 500
200 200 100
300 100 200
df.rename(columns=variables_table.set_index('variable')['label'])
Out:
male 18 to 19 years male 20 years male 21 years
0 100 300 500
1 200 200 100
2 300 100 200
Note that variables_table.set_index('variable')['label'] is a Series whose index is 'variable`. It will do the mapping on that index.
This is not an inplace operation. If you want to change the actual dataframe, assign it back to df: df = df.rename(columns=variables_table.set_index('variable')['label']) or use the inplace parameter: df.rename(columns=variables_table.set_index('variable')['label'], inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With