I have a pandas dataframe that has data as in the below table:
Negative Positive Neutral
True False False
True False False
False False True
False True False
True False False
False True False
True False False
True False False
What I am doing is to create a new column ("Overall") and depending upon conditions that if the row value for column "Positive" is True, Overall column gets the value "Positive", if Column "Negative" is True, then Overall will take "Negative" otherwise "Neutral" values:
def flag_df(df):
if (df["Negative"] == "True") and (df["Positive"] == "False") and (df["Neutral"] == "False"):
return "Negative"
elif (df["Negative"] == "False") and (df["Positive"] == "True") and (df["Neutral"] == "False"):
return "Positive"
else :
return "Neutral"
fdf['Overall'] = fdf.apply(flag_df, axis = 1)
but unfortunately, I don't know what I did wrong, all the observations in "Overall" column is coming out to be "Neutral":
Negative Positive Neutral Overall
True False False Neutral
True False False Neutral
False False True Neutral
False True False Neutral
True False False Neutral
False True False Neutral
True False False Neutral
True False False Neutral
Can someone please let me know where I did wrong?
If all columns are boolean and always only one True per row is possible use DataFrame.dot:
print (df.dtypes)
Negative bool
Positive bool
Neutral bool
dtype: object
df['Overall'] = df.dot(df.columns)
print (df)
Negative Positive Neutral Overall
0 True False False Negative
1 True False False Negative
2 False False True Neutral
3 False True False Positive
4 True False False Negative
5 False True False Positive
6 True False False Negative
7 True False False Negative
If multiple columns names use:
cols = ['Negative', 'Positive', 'Neutral']
df['Overall'] = df[cols].dot(pd.Index(cols))
Or:
df1 = df[cols]
df['Overall'] = df1.dot(df1.columns)
Your solution should be changed by numpy.select:
m1 = df["Negative"] & ~df["Positive"] & ~df["Neutral"]
m2 = ~df["Negative"] & df["Positive"] & ~df["Neutral"]
df['Overall'] = np.select([m1, m2], ['Negative','Positive'], default='Neutral')
print (df)
Negative Positive Neutral Overall
0 True False False Negative
1 True False False Negative
2 False False True Neutral
3 False True False Positive
4 True False False Negative
5 False True False Positive
6 True False False Negative
7 True False False Negative
If possible multiple match - Trues per rows is possible add separator to columns anmes and then remove last ,:
print (df)
Negative Positive Neutral
0 True False True
1 True False False
2 False False True
3 False True False
4 True False False
5 False True False
6 True False False
7 True False False
df['Overall'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
Negative Positive Neutral Overall
0 True False True Negative,Neutral
1 True False False Negative
2 False False True Neutral
3 False True False Positive
4 True False False Negative
5 False True False Positive
6 True False False Negative
7 True False False Negative
IIUC, you could use idxmax:
df['Overall'] = df.idxmax(axis=1)
print(df)
Output
Negative Positive Neutral Overall
0 True False False Negative
1 True False False Negative
2 False False True Neutral
3 False True False Positive
4 True False False Negative
5 False True False Positive
6 True False False Negative
7 True False False Negative
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With