Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance with NOT EXISTS - t-sql query

This (modified for simplicity) query is part of a larger query, and joined on date with other selects. However I have pinned this section to be dog slow. Say I have a UserLoginHistory-table that logs each login for a user. For each user I want the date they first logged in. (Later on in the query, I group by LogDate to get how many first-time logins there were each day.)

select
    LogDate, --(this value is only date, no time)
    UserId
from
    UserLoginHistory ul
where
    not exists
        (
            select 
                * 
            from 
                UserLoginHistory ulPrevious
            where
                ulPrevious.LogDate < ul.LogDate
                and ul.UserId = ulPrevious.UserId
        )
group by ul.LogDate, ul.UserId

Obviously the NOT EXISTS-part is the slow one. But I can't figure out how to replace it by something more efficient doing the same job.

With a small UserLogHistory-count, performance is no problem. It's when I get to around 15 000 it starts getting slow. Maybe I should batch the result for each day into another table, but I'd like to find a better solution to this query as there should be one out there...

Thanks for your time!

like image 532
cederlof Avatar asked Feb 20 '26 21:02

cederlof


1 Answers

You can use a row numbering method:

select LogDate,UserId from (
    select
       LogDate, 
       UserId
       row_number() over (partition by UserId order by LogDate) as rown
    from
        UserLoginHistory ul
)
where rown = 1

Rows for each ID are numbered by LogDate, so the earliest one will always be numbered 1.

Note: I don't think the group by in your original query was necessary--the not exists clause should guarantee that you only get unique combinations of UserId and LogDate.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!