new column in pandas dataframe failed to get the expected values basis if conditions on multiple columns

Question

I have a pandas dataframe that has data as in the below table:

Negative  Positive  Neutral
True      False     False
True      False     False
False     False     True
False     True      False
True      False     False
False     True      False
True      False     False
True      False     False

What I am doing is to create a new column ("Overall") and depending upon conditions that if the row value for column "Positive" is True, Overall column gets the value "Positive", if Column "Negative" is True, then Overall will take "Negative" otherwise "Neutral" values:

def flag_df(df):
    if (df["Negative"] == "True") and (df["Positive"] == "False") and (df["Neutral"] == "False"):
        return "Negative"
    elif (df["Negative"] == "False") and (df["Positive"] == "True") and (df["Neutral"] == "False"):
        return "Positive"
    else :
        return "Neutral"

fdf['Overall'] = fdf.apply(flag_df, axis = 1)

but unfortunately, I don't know what I did wrong, all the observations in "Overall" column is coming out to be "Neutral":

Negative    Positive    Neutral     Overall
True           False     False      Neutral
True           False     False      Neutral
False          False     True       Neutral
False          True      False      Neutral
True           False     False      Neutral
False          True      False      Neutral
True           False     False      Neutral
True           False     False      Neutral

Can someone please let me know where I did wrong?

jezrael · Accepted Answer

If all columns are boolean and always only one True per row is possible use DataFrame.dot:

print (df.dtypes)
Negative    bool
Positive    bool
Neutral     bool
dtype: object

df['Overall'] = df.dot(df.columns)
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

If multiple columns names use:

cols = ['Negative', 'Positive', 'Neutral']
df['Overall'] = df[cols].dot(pd.Index(cols))

Or:

df1 = df[cols]
df['Overall'] = df1.dot(df1.columns)

Your solution should be changed by numpy.select:

m1 = df["Negative"] & ~df["Positive"] & ~df["Neutral"]
m2 = ~df["Negative"] & df["Positive"] & ~df["Neutral"]

df['Overall'] = np.select([m1, m2], ['Negative','Positive'], default='Neutral')
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

If possible multiple match - Trues per rows is possible add separator to columns anmes and then remove last ,:

print (df)
   Negative  Positive  Neutral
0      True     False     True
1      True     False    False
2     False     False     True
3     False      True    False
4      True     False    False
5     False      True    False
6      True     False    False
7      True     False    False


df['Overall'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
   Negative  Positive  Neutral           Overall
0      True     False     True  Negative,Neutral
1      True     False    False          Negative
2     False     False     True           Neutral
3     False      True    False          Positive
4      True     False    False          Negative
5     False      True    False          Positive
6      True     False    False          Negative
7      True     False    False          Negative

Dani Mesejo · Answer

IIUC, you could use idxmax:

df['Overall'] = df.idxmax(axis=1)
print(df)

Output

   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

new column in pandas dataframe failed to get the expected values basis if conditions on multiple columns

Tags:

python

python-3.x

pandas

if-statement

loveR

2 Answers

jezrael

Dani Mesejo

Recent Activity

Donate For Us

new column in pandas dataframe failed to get the expected values basis if conditions on multiple columns

Tags:

python

python-3.x

pandas

if-statement

loveR

2 Answers

jezrael

Dani Mesejo

Related questions

Recent Activity

Donate For Us