I can successfully convert the two columns to matrix using the following commands.
dfb = datab.parse("a")
dfb
Name Product
0 Mike Apple,pear
1 John Orange,Banana
2 Bob Banana
3 Connie Pear
pd.get_dummies(dfb.Product).groupby(dfb.Name).apply(max)
Apple,pear Banana Orange,Banana Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 0 1 0
Mike 1 0 0 0
However, the matrix that I want to have is the following.
Apple Banana Orange Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 1 1 0
Mike 1 0 0 1
1.
You need set_index with get_dummies:
df = dfb.set_index('Name').Product.str.get_dummies(',')
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
2.
Solution with pandas.get_dummies with split for new DataFarme, last groupby by columns, so axis=1 and level=0 and aggregate max:
dfb = dfb.set_index('Name')
df = pd.get_dummies(dfb.Product.str.split(',', expand=True), prefix='', prefix_sep='')
.groupby(axis=1, level=0).max()
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
3.
Solution with split and MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(dfb.Product.str.split(',')),
columns=mlb.classes_,
index=dfb.Name)
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
If duplicates in column Name:
df = df.groupby('Name').max()
print (df)
Apple Banana Orange Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 1 1 0
Mike 1 0 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With