I am trying to find distinct count of transacting client ids prior to the time of a transaction from this table and associate the count to each transaction
| transaction id | client id | transaction date |
|---|---|---|
| t1 | C1 | 01/01/2024 |
| t2 | C2 | 01/02/2024 |
| t3 | C2 | 01/02/2024 |
| t4 | C3 | 01/03/2024 |
| t5 | C1 | 01/04/2024 |
| t6 | C4 | 01/05/2024 |
Desired output:
| transaction id | client id | transaction date | prior distinct client count |
|---|---|---|---|
| t1 | C1 | 01/01/2024 | 0 |
| t2 | C2 | 01/02/2024 | 1 |
| t3 | C2 | 01/02/2024 | 1 |
| t4 | C3 | 01/03/2024 | 2 |
| t5 | C1 | 01/04/2024 | 3 |
| t6 | C4 | 01/05/2024 | 3 |
I tried using
select
transaction id,
client id,
transaction date,
count(distinct client id) over (order by transaction date asc rows between unbounded preceding and 1 preceding)
from
<table name>
However, snowflake does not support an OVER clause that contains an ORDER BY clause for DISTINCT operation in the window function.
I am also trying to avoid self join queries as the transaction table is very large. Any help with alternatives is appreciated.
create or replace table transaction_client as
select
column1 as transaction_id,
column2 as client_id,
column3 as transaction_date
from
values
('t1', 'C1', '01/01/2024'),
('t2', 'C2', '01/02/2024'),
('t3', 'C2', '01/02/2024'),
('t4', 'C3', '01/03/2024'),
('t5', 'C1', '01/04/2024'),
('t6', 'C4', '01/05/2024');
select
a.transaction_id,
a.client_id,
a.transaction_date,
count(distinct b.client_id) as prior_distinct_client_count
from
transaction_client a
left join transaction_client b
on b.transaction_date < a.transaction_date
group by a.transaction_id,a.client_id, a.transaction_date
order by a.transaction_date;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With