I have a data frame with products on rows and their characteristics.
I would like for every unique value in every characteristics column, to create a new dummy variable, which will have 1 if this specific characteristic value exists for that specific product and 0 otherwise.
As an example:
import pandas as pd
df = pd.DataFrame({'id':['prod_A','prod_A','prod_B','prod_B'],
                       'color':['red','green','red','black'],
                       'size':[1,2,3,4]})
and I would like to end up with a data frame like this: 
df_f = pd.DataFrame({'id': ['prod_A', 'prod_B'],
                         'color_red': [1, 1],
                         'color_green': [1, 0],
                         'color_black': [0, 1],
                         'size_1': [1, 0],
                         'size_2': [1, 0],
                         'size_3': [0, 1],
                         'size_4': [0, 1]})
Any ideas ?
Use get_dummies with aggregate max:
#dummies for all columns without `id`
df = pd.get_dummies(df.set_index('id')).max(level=0).reset_index()
#dummies for columns in list
df = pd.get_dummies(df, columns=['color','size']).groupby('id', as_index=False).max()
print (df)
       id  color_black  color_green  color_red  size_1  size_2  size_3  size_4
0  prod_A            0            1          1       1       1       0       0
1  prod_B            1            0          1       0       0       1       1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With