Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark count and filtered count in same query

In SQL something like

SELECT  count(id), sum(if(column1 = 1, 1, 0)) from groupedTable

could be formulated to perform a count of the total records as well as filtered records in a single pass.

How can I perform this in spark-data-frame API? i.e. without needing to join back one of the counts to the original data frame.

like image 214
Georg Heiler Avatar asked Jan 24 '26 16:01

Georg Heiler


1 Answers

Just use count for both cases:

df.select(count($"id"), count(when($"column1" === 1, true)))

If column is nullable you should correct for that (for example with coalesce or IS NULL, depending on the desired output).

like image 156
zero323 Avatar answered Jan 27 '26 06:01

zero323



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!