Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas/Python Converting two columns to matrix. Column names in matrix

Tags:

python

pandas

I can successfully convert the two columns to matrix using the following commands.

dfb = datab.parse("a")

dfb

    Name       Product
0   Mike       Apple,pear
1   John       Orange,Banana
2   Bob        Banana
3   Connie      Pear


pd.get_dummies(dfb.Product).groupby(dfb.Name).apply(max)


    Apple,pear  Banana  Orange,Banana   Pear
Name                
Bob         0   1   0   0
Connie      0   0   0   1
John        0   0   1   0
Mike        1   0   0   0

However, the matrix that I want to have is the following.

      Apple     Banana  Orange  Pear
Name                
Bob        0    1   0   0
Connie     0    0   0   1
John       0    1   1   0
Mike       1    0   0   1
like image 870
sfhotmail Avatar asked Dec 11 '25 07:12

sfhotmail


1 Answers

1.

You need set_index with get_dummies:

df = dfb.set_index('Name').Product.str.get_dummies(',')
print (df)
        Apple  Banana  Orange  Pear
Name                               
Mike        1       0       0     1
John        0       1       1     0
Bob         0       1       0     0
Connie      0       0       0     1

2.

Solution with pandas.get_dummies with split for new DataFarme, last groupby by columns, so axis=1 and level=0 and aggregate max:

dfb = dfb.set_index('Name')
df = pd.get_dummies(dfb.Product.str.split(',', expand=True), prefix='', prefix_sep='')
       .groupby(axis=1, level=0).max()
print (df)
        Apple  Banana  Orange  Pear
Name                               
Mike        1       0       0     1
John        0       1       1     0
Bob         0       1       0     0
Connie      0       0       0     1

3.

Solution with split and MultiLabelBinarizer:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

df = pd.DataFrame(mlb.fit_transform(dfb.Product.str.split(',')),
                  columns=mlb.classes_, 
                  index=dfb.Name)
print (df)
        Apple  Banana  Orange  Pear
Name                               
Mike        1       0       0     1
John        0       1       1     0
Bob         0       1       0     0
Connie      0       0       0     1

If duplicates in column Name:

df = df.groupby('Name').max()
print (df)
        Apple  Banana  Orange  Pear
Name                               
Bob         0       1       0     0
Connie      0       0       0     1
John        0       1       1     0
Mike        1       0       0     1
like image 140
jezrael Avatar answered Dec 13 '25 21:12

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!