Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract attributes from pandas columns that satisfy a condition

Let's say I have a table of frequencies of 3 different variables: M1, M2 and M3, over different instances: P1, ... P4:

tupl = [(0.7, 0.2, 0.1), (0,0,1), (0.2,0.6,0.2), (0.6,0.4,0)]

df_test = pd.DataFrame(tupl, columns = ["M1", "M2", "M3"], index =["P1", "P2", "P3", "P4"])

Now for each row, I want to be able to extract as a string, the occurrence of each variable, such that the final output would be something like:

output = pd.DataFrame([("M1+M2+M3"), ("M3"), ("M1+M2+M3"), ("M1+M2")], columns = ["label"], index = ["P1", "P2", "P3", "P4"])

I thought about using something like np.where(df_test!=0) but then how do I paste the column names as a string into the output?

like image 493
La Cordillera Avatar asked Oct 28 '25 06:10

La Cordillera


2 Answers

You can use np.where to fill the cells with labels and then join them as a string.

(
    df_test.gt(0).apply(lambda x: np.where(x, x.name, None))
    .apply(lambda x: '+'.join(x.dropna()), axis=1)
    .to_frame('label')
)


    label
P1  M1+M2+M3
P2  M3
P3  M1+M2+M3
P4  M1+M2
like image 81
Allen Avatar answered Oct 29 '25 20:10

Allen


I have done it this way and I hope it helps you:

import pandas as pd
df_test = pd.DataFrame(tupl, columns = ["M1", "M2", "M3"], index =["P1", "P2", "P3", "P4"])
new=[]
for row in df_test.itertuples():
 aux=[]
 if row.M1!=0: aux.append('M1')
 if row.M2!=0: aux.append('M2')
 if row.M3!=0: aux.append('M3')
output = pd.DataFrame(new, columns = ["label"], index = ["P1", "P2", "P3", "P4"])
like image 28
Jonathan Sánchez Avatar answered Oct 29 '25 20:10

Jonathan Sánchez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!