Is there a way to have pandas.get_dummies output the numerical representation in one column rather than a separate column for each option?
Concretely, currently when using pandas.get_dummies it gives me a column for every option:
| Size | Size_Big | Size_Medium | Size_Small |
|---|---|---|---|
| Big | 1 | 0 | 0 |
| Medium | 0 | 1 | 0 |
| Small | 0 | 0 | 1 |
But I'm looking for more of the following output:
| Size | Size_Numerical |
|---|---|
| Big | 1 |
| Medium | 2 |
| Small | 3 |
You don't want dummies, you want factors/categories.
Use pandas.factorize:
df['Size_Numerical'] = pd.factorize(df['Size'])[0] + 1
output:
Size Size_Numerical
0 Big 1
1 Medium 2
2 Small 3
I think OneHotEncoding has a similar issue that it expands and creates n-dimensions as labels. You need to use LabelEncoder so that:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Sizes'])
df['Category'] = le.transform(df['Sizes']) + 1
Outputs:
Sizes Category
0 Small 3
1 Medium 2
2 Large 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With