I have a dataset which contains email, city, state, zip and date. What i need is one row for each emails and city, state and zip be filled with the latest non-null value available for each.
Input:

Output:

I am using a query like below but it is taking hours to run. Is there any other effecient way to get the desired output in SQL?
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn1
into #d1 from data where zip is not null and email_addr is not null;
select Email_Addr,city,
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn2 into #d2
from data where city is not null and email_addr is not null;
select Email_Addr,[state],
row_number()over(partition by Email_Addr order by email_effective_from desc) as rn3 into #d3
from data where state is not null and email_addr is not null;
select a.email_addr,a.zip,b.city,c.[state] into #dff from #d1 a
full outer join #d2 b on a.email_addr=b.email_addr
full outer join #d3 c on a.email_addr=c.email_addr```
If you are running SQL Server 2022, one option uses last_value and ignore nulls:
select *
from (
select email, date,
last_value(city) ignore nulls over(partition by email order by date) city,
last_value(state) ignore nulls over(partition by email order by date) state,
last_value(zip) ignore nulls over(partition by email order by date) zip,
row_number() over(partition by email order by date desc) rn
from mytable t
) t
where rn = 1
| date | city | state | zip | rn | |
|---|---|---|---|---|---|
| abc | 2023-01-04 | B | JP | 160007 | 1 |
fiddle
Or we can use with ties instead of filtering:
select top (1) with ties email, date,
last_value(city) ignore nulls over(partition by email order by date) city,
last_value(state) ignore nulls over(partition by email order by date) state,
last_value(zip) ignore nulls over(partition by email order by date) zip,
row_number() over(partition by email order by date desc) rn
from mytable t
order by row_number() over(partition by email order by date desc)
In earlier versions, one alternative uses a gaps-and-islands technique to build groups of rows, then aggregates over those groups:
select top (1) with ties email, date,
max(city) over(partition by email, grp_city ) city,
max(state) over(partition by email, grp_state) state,
max(zip) over(partition by email, grp_zip ) zip
from (
select t.*,
count(city) over(partition by email order by date) grp_city,
count(state) over(partition by email order by date) grp_state,
count(zip) over(partition by email order by date) grp_zip
from mytable t
) t
order by row_number() over(partition by email order by date desc)
Demo on DB Fiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With