I have an SQL table with 4 columns:
id - varchar(50)g1 - varchar(50)g2 - varchar(50)datetime - timestamp I have this query:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
This query was built to answer: What is the distributions of g1 value out of the users?
Each id may have multiple records in the table. I wish to consider the earliest one. early means, the minimum datetime value.
id g1 g2 datetime
x1 a start 2016-01-19 21:01:22
x1 c start 2016-01-19 21:01:21
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
g1 count total share
a 2 4 0.5
b 1 4 0.25
c 1 4 0.25
we have 4 records, but I only want to consider two records:
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
which are the earliest records per id.
g1 count total share
a 1 2 0.5
b 1 2 0.5
How do I consider only the earliest record, per id, in the group by
Here is a solution which should work in SQL Server, and any database which supports CTE:
WITH cte AS
(
SELECT t1.g1,
COUNT(*) AS count
FROM yourTable t1
INNER JOIN
(
SELECT id, MIN(datetime) AS datetime
FROM yourTable
GROUP BY id
) t2
ON t1.id = t2.id AND
t1.datetime = t2.datetime
)
SELECT t.g1,
t.count,
(SELECT COUNT(*) FROM cte) AS total,
t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With