I have a non-indexed Pandas dataframe where each row consists of numeric and boolean values with some NaNs. An example row in my dataframe might look like this (with variables above):
X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN
I would like to add a new variable to my dataframe, call it X_13, which is the number of True values in each row. So in the above case, I would like to obtain:
X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12 X_13
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN  2
I have tried df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10] and that gives me what I want unless the row contains a NaN in a location where a Boolean is expected. For those rows, X_13 has the value NaN. 
Sorry -- this feels like it should be absurdly simple. Any suggestions?
Select boolean columns and then sum:
df.select_dtypes(include=['bool']).sum(axis=1)
If you have NaNs, first fill with False's:
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Consider this DataFrame:
df
Out: 
       a      b  c     d
0   True  False  1  True
1  False   True  2   NaN
df == True returns True for (0, c) as well:
df == True
Out: 
       a      b      c      d
0   True  False   True   True
1  False   True  False  False
So if you take the sum, you will get 3 instead of 2. Another important point is that boolean arrays cannot contain NaNs. So if you check the dtypes, you will see:
df.dtypes
Out: 
a      bool
b      bool
c     int64
d    object
dtype: object
By filling with Falses you can have a boolean array:
df.fillna(False).dtypes
Out: 
a     bool
b     bool
c    int64
d     bool
dtype: object
Now you can safely sum by selecting the boolean columns.
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Out: 
0    2
1    1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With