Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordinal encoding in Pandas

Is there a way to have pandas.get_dummies output the numerical representation in one column rather than a separate column for each option?

Concretely, currently when using pandas.get_dummies it gives me a column for every option:

Size Size_Big Size_Medium Size_Small
Big 1 0 0
Medium 0 1 0
Small 0 0 1

But I'm looking for more of the following output:

Size Size_Numerical
Big 1
Medium 2
Small 3
like image 740
mikelowry Avatar asked Oct 28 '25 22:10

mikelowry


2 Answers

You don't want dummies, you want factors/categories.

Use pandas.factorize:

df['Size_Numerical'] = pd.factorize(df['Size'])[0] + 1

output:

     Size  Size_Numerical
0     Big               1
1  Medium               2
2   Small               3
like image 111
mozway Avatar answered Oct 31 '25 12:10

mozway


I think OneHotEncoding has a similar issue that it expands and creates n-dimensions as labels. You need to use LabelEncoder so that:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Sizes'])
df['Category'] = le.transform(df['Sizes']) + 1

Outputs:

    Sizes  Category
0   Small         3
1  Medium         2
2   Large         1
like image 27
Celius Stingher Avatar answered Oct 31 '25 10:10

Celius Stingher