What would be a more elegant way to writing:
df[df['income'] > 0].count()['income']
I would like to simply count the number of column values meeting a condition (in this example, the condition is just being larger than zero, but I would like a way applicable to any arbitrary condition or set of conditions). Obviously more elegant if the column name would not need to show up twice in the expression. Should be hopefully easy.
df = pd.DataFrame([0, 30000, 75000, -300, 23000], columns=['income'])
print(df)
income
0 0
1 30000
2 75000
3 -300
4 23000
If you would like to count values in a column meeting a slightly more complex condition than just being positive, for example "value is in the range from 5000 to 25000", you can use two methods.
First, using boolean indexing,
((df['income'] > 5000) & (df['income'] < 25000)).sum()
Second, applying a function on every row of the series,
df['income'].map(lambda x: 5000 < x < 25000).sum()
Note that the second approach allows arbitrarily complex condition but is much slower than the first approach which is using low-level operations on the underlying arrays. See the documentation on boolean indexing for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With