I have an SQL table with 4 columns:
id -    varchar(50)g1 -    varchar(50)g2 -    varchar(50)datetime -  timestamp   I have this query:
SELECT g1,
       COUNT(DISTINCT id),
       SUM(COUNT(DISTINCT id)) OVER () AS total,
       (CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
This query was built to answer: What is the distributions of g1 value out of the users?
Each id may have multiple records in the table. I wish to consider the earliest one. early means, the minimum datetime value.
id    g1    g2      datetime
x1    a     start   2016-01-19 21:01:22
x1    c     start   2016-01-19 21:01:21
x2    b     start   2016-01-19 09:03:42
x1    a     start   2016-01-18 13:56:45
g1  count   total   share
a   2       4       0.5
b   1       4       0.25
c   1       4       0.25
we have 4 records, but I only want to consider two records:
x2    b     start   2016-01-19 09:03:42
x1    a     start   2016-01-18 13:56:45
which are the earliest records per id.
g1  count   total   share
a   1       2       0.5
b   1       2       0.5
How do I consider only the earliest record, per id, in the group by
Here is a solution which should work in SQL Server, and any database which supports CTE:
WITH cte AS
(
    SELECT t1.g1,
           COUNT(*) AS count
    FROM yourTable t1
    INNER JOIN
    (
        SELECT id, MIN(datetime) AS datetime
        FROM yourTable
        GROUP BY id
    ) t2
        ON t1.id = t2.id AND
           t1.datetime = t2.datetime
)
SELECT t.g1,
       t.count,
       (SELECT COUNT(*) FROM cte) AS total,
       t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With