Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove unknown characters using T-SQL?

I try to do a script to eliminate the Chinese characters in a string and between the Chinese characters I want to remove. Please see the example below. Thank you.

select LTRIM(SUBSTRING('Tower 6A 第6A座', PATINDEX('%[a-zA-Z0-9]%', 'Tower 6A 第6A座'), LEN('Tower 6A 第6A座')))
select LTRIM(REPLACE(SUBSTRING('Tower 6A 第6A座', CHARINDEX(N'樓', 'Tower 6A 第6A座') + 1, LEN('Tower 6A 第6A座')), ' ', ''))

Sample string:

Tower 6A 第6A座
Tower 3 第3座

Bad Result:

Tower 6A ?6A?
Tower6A?6A?
Tower 3 ?3?
Tower3?3?

Good result, I want to achieve:

Tower 6A
Tower 6A
Tower 3
Tower 3
like image 346
Patrick Pangilinan Avatar asked Dec 08 '25 13:12

Patrick Pangilinan


2 Answers

Try this

   SELECT Replace(Replace('Tower 6A 第6A座','[^a-zA-Z0-9]+', ''),'?','')
like image 195
Fmanin Avatar answered Dec 11 '25 01:12

Fmanin


Looks weird but does the job. However I wouldn't count on great performance:

;WITH string AS (
    SELECT N'Tower 6A 第6A座' s
),
split AS (
    select LEFT(s, 1) s_item,
       STUFF(s, 1, 1, N'') s
    from string
    union all
    select LEFT(s, 1),
       STUFF(s, 1, 1, N'')
    from split
    where s > ''
)
,
remove_non_ascii AS ( 
    SELECT s_item, UNICODE(s_item) s_unicode
    FROM split WHERE UNICODE(s_item)<256
)
SELECT STUFF((SELECT s_item FROM remove_non_ascii
FOR XML PATH, TYPE).value('.[1]', 'NVARCHAR(MAX)'), 1,0, '');

What it does:

  1. Splits strings into rows

  2. Eliminates UNICODE characters bigger than 256 (you can fiddle with condition)

  3. Puts the string together

It uses recursive query so in case of strings longer than 100 characters you need to expand amount of recursive loops by adding: OPTION (MAXRECURSION n) (where n is new amount of recursive loops)


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!