Count distinct values across more than one column

Question

I need to count distinct values that have been spread across more than one column. I have a method but suspect there is a more elegant way.

Example input:

year	project	staff 1	staff 2	staff 3	cost
2022	A		Ian		100
2023	A	Jim	Anne		200
2024	A	Anne	Jim	Peter	300
2023	B	Anne	Sue		400
2024	B			Dave	500

Expected output:

project	number_distinct_staff	total_cost
A	4	600
B	3	900

Current method:

SELECT project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(cost) AS total_cost
FROM (
    SELECT project, staff_1 AS staff, cost AS cost FROM table UNION ALL
    SELECT project, staff_2 AS staff, NULL AS cost FROM table UNION ALL
    SELECT project, staff_3 AS staff, NULL AS cost FROM table
) AS sub
GROUP BY project

This approach feels clumsy to me. I am worried that the asymmetric handling of columns (like cost in the example) makes the process vulnerable to double counting.

Is there a better way?

Yitzhak Khabinsky · Accepted Answer

Please try the following solution leveraging a Table Value Constructor.

For the refence: The Table Value Constructor (aka the VALUES keyword)

Basically, we are converting columns to rows via VALUES (...)

db-fiddle

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (project CHAR(1), staff_1 VARCHAR(20), staff_2 VARCHAR(20), staff_3 VARCHAR(20), cost INT);
INSERT @tbl (project, staff_1, staff_2, staff_3, cost) VALUES
('A', NULL,  'Ian',  NULL, 100),
('A', 'Jim', 'Anne', NULL, 200),
('A', 'Anne', 'Jim', 'Peter', 300),
('B', 'Anne', 'Sue', NULL, 400),
('B', NULL,   NULL, 'Dave', 500);
-- DDL and sample data population, end

SELECT t1.project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(t1.cost) AS total_cost
FROM @tbl AS t
CROSS APPLY (VALUES
    (project, staff_1, cost),
    (project, staff_2, NULL),
    (project, staff_3, NULL)) AS t1(project, staff, cost)
GROUP BY t1.project;

Output

project	num_distinct_staff	total_cost
A	4	600
B	3	900

GuidoG · Answer

First off, your solution is correct you just had a little error,
see my dbFiddle

-- your solution after correction
SELECT project
    , COUNT(DISTINCT staff) AS num_distinct_staff
    , SUM(cost) AS total_cost
FROM (
    SELECT project, staff_1 as staff, cost FROM tbl UNION ALL
    SELECT project, staff_2, NULL AS staff FROM tbl UNION ALL
    SELECT project, staff_3, NULL AS staff FROM tbl
) AS sub
GROUP BY project

If you want to try something other than your solution or the other answer, another way is to use both string_agg and string_split like this

select t.project,
       count(distinct value) as number_distinct_staff,
       min(t.costs) as total_cost
from   ( select project,
                sum(cost) as costs,
                string_agg(staff_1, ',') + ',' + string_agg(staff_2, ',') + ',' + string_agg(staff_3, ',') as staffs
         from   tbl
         group by project
       )  t
  cross apply string_split(t.staffs, ',')
group by t.project

Here is the dbFiddle

And the result is

project	number_distinct_staff	total_cost
A	4	600
B	3	900

Count distinct values across more than one column

Tags:

sql

sql-server

Simon.S.A.

2 Answers

Yitzhak Khabinsky

GuidoG

Recent Activity

Donate For Us

Count distinct values across more than one column

Tags:

sql

sql-server

Simon.S.A.

2 Answers

Yitzhak Khabinsky

GuidoG

Related questions

Recent Activity

Donate For Us