Pandas: map values of categorical variable to a predefined list of dummy columns

Question

I have a categorical variable with known levels (e.g. hour that just contains values between 0 and 23), but not all of them are available right now (say, we have measurements from between 0 and 11 o'clock, while hours from 12 to 23 are not covered), though other values are going to be added later. If we naively use pandas.get_dummies() to map values to indicator variables, we will end up with only 12 of them instead of 24. Is there a way to map values of the categorical variable to a predefined list of dummy variables?

Here's an example of expected behaviour:

possible_values = range(24)
hours = get_dummies_on_steroids(df['hour'], prefix='hour', levels=possible_values)

Here's an example of expected behaviour:

possible_values = range(24)
hours = get_dummies_on_steroids(df['hour'], prefix='hour', levels=possible_values)

Marius · Accepted Answer

Using the new and improved Categorical type in pandas 0.15:

import pandas as pd
import numpy as np
df = pd.DataFrame({'hour': [0, 1, 3, 8, 13, 14], 'val': np.random.randn(6)})
df
Out[4]: 
   hour       val
0     0 -0.098287
1     1 -0.682777
2     3  1.000749
3     8 -0.558877
4    13  1.423675
5    14  1.461552

df['hour_cat'] = pd.Categorical(df['hour'], categories=range(24))
pd.get_dummies(df['hour_cat'])
Out[6]: 
   0   1   2   3   4   5   6   7   8   9  ...  
0   1   0   0   0   0   0   0   0   0   0 ...      
1   0   1   0   0   0   0   0   0   0   0 ...   
2   0   0   0   1   0   0   0   0   0   0 ...   
3   0   0   0   0   0   0   0   0   1   0 ...   
4   0   0   0   0   0   0   0   0   0   0 ...   
5   0   0   0   0   0   0   0   0   0   0 ...

The situation you describe, where you know your data can take a specific set of values, but you haven't necessarily observed all of them, is exactly what Categorical is good for.

Pandas: map values of categorical variable to a predefined list of dummy columns

Tags:

python

pandas

dummy-data

ffriend

1 Answers

Marius

Recent Activity

Donate For Us

Pandas: map values of categorical variable to a predefined list of dummy columns

Tags:

python

pandas

dummy-data

ffriend

1 Answers

Marius

Related questions

Recent Activity

Donate For Us