spark count and filtered count in same query

Question

In SQL something like

SELECT  count(id), sum(if(column1 = 1, 1, 0)) from groupedTable

could be formulated to perform a count of the total records as well as filtered records in a single pass.

How can I perform this in spark-data-frame API? i.e. without needing to join back one of the counts to the original data frame.

zero323 · Accepted Answer

Just use count for both cases:

df.select(count($"id"), count(when($"column1" === 1, true)))

If column is nullable you should correct for that (for example with coalesce or IS NULL, depending on the desired output).

Donate For Us