Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server uses high CPU when searching inside nvarchar strings

Check out the following example. It shows that searching within a unicode string (nvarchar) is almost eight times as bad as searching within a varchar string. And on par with implicit conversions. Looking for an explanation for this. Or a way to search within nvarchar strings more efficiently.

use tempdb
create table test
(
    testid int identity primary key,
    v varchar(36),
    nv nvarchar(36),
    filler char(500)
)
go

set nocount on
set statistics time off
insert test (v, nv)
select CAST (newid() as varchar(36)),
    CAST (newid() as nvarchar(36))
go 1000000

set statistics time on
-- search utf8 string
select COUNT(1) from test where v like '%abcd%' option (maxdop 1)
-- CPU time = 906 ms,  elapsed time = 911 ms.

-- search utf8 string using unicode (uses convert_implicit)
select COUNT(1) from test where v like N'%abcd%' option (maxdop 1)
-- CPU time = 6969 ms,  elapsed time = 6970 ms.

-- search unicode string
select COUNT(1) from test where nv like N'%abcd%' option (maxdop 1)
-- CPU time = 6844 ms,  elapsed time = 6911 ms.
like image 202
Michael J Swart Avatar asked Sep 08 '25 15:09

Michael J Swart


2 Answers

Looking for an explanation for this.

NVarchar is 16 bit and Unicode comparison rules are a lot more complicated than ASCII - special chars for the various languages that are supported at the same time require quote some more processing.

like image 169
TomTom Avatar answered Sep 10 '25 07:09

TomTom


My guess is that LIKE is implemented using an O(n^2) algorithm as opposed to an O(n) algorithm; it would probably have to be for the leading % to work. Since the Unicode string is twice as long, that seems consistent with your numbers.

like image 40
Larry Coleman Avatar answered Sep 10 '25 08:09

Larry Coleman