Check out the following example. It shows that searching within a unicode string (nvarchar) is almost eight times as bad as searching within a varchar string. And on par with implicit conversions. Looking for an explanation for this. Or a way to search within nvarchar strings more efficiently.
use tempdb
create table test
(
testid int identity primary key,
v varchar(36),
nv nvarchar(36),
filler char(500)
)
go
set nocount on
set statistics time off
insert test (v, nv)
select CAST (newid() as varchar(36)),
CAST (newid() as nvarchar(36))
go 1000000
set statistics time on
-- search utf8 string
select COUNT(1) from test where v like '%abcd%' option (maxdop 1)
-- CPU time = 906 ms, elapsed time = 911 ms.
-- search utf8 string using unicode (uses convert_implicit)
select COUNT(1) from test where v like N'%abcd%' option (maxdop 1)
-- CPU time = 6969 ms, elapsed time = 6970 ms.
-- search unicode string
select COUNT(1) from test where nv like N'%abcd%' option (maxdop 1)
-- CPU time = 6844 ms, elapsed time = 6911 ms.
Looking for an explanation for this.
NVarchar is 16 bit and Unicode comparison rules are a lot more complicated than ASCII - special chars for the various languages that are supported at the same time require quote some more processing.
My guess is that LIKE
is implemented using an O(n^2) algorithm as opposed to an O(n) algorithm; it would probably have to be for the leading %
to work. Since the Unicode string is twice as long, that seems consistent with your numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With