I have a table (EMAIL) of email addresses:
EmailAddress
------------
[email protected]
[email protected]
[email protected]
[email protected]
and a table (BLACKLIST) of blacklisted email addresses:
EmailAddress
------------
[email protected]
[email protected]
and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:
SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
(
SELECT EmailAddress
FROM BLACKLIST
)
but when the row counts get very high the performance is terrible.
How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)
You can use a left outer join, or a not exists clause.
Left outer join:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress), that should help speed up the execution of your query.
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With