Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store UTF8 data in UTF16 column

Tags:

c#

sql-server

xml

I'm storing XML in an XML column in SQL Server. SQL Server stores the data internally in UTF-16. Therefore the XML that is stored has to be in UTF-16.

The XML I have is in utf-8, it has this declaration on top:

<?xml version="1.0" encoding="UTF-8" ?>

When I try to insert xml with the UTF-8 declaration I get an exception saying something about the encoding. I can easily fix this in two ways:

  • by removing the declaration or

  • by changing the declaration to

:

<?xml version="1.0" encoding="UTF-16" ?>

Problem

I don't know if it's 'safe' or correct to just remove or replace the declaration. Will I lose data, or will the XML become corrupt? Or do I have to convert the string in C# from utf-8 to utf-16?

like image 231
user369117 Avatar asked Dec 29 '25 21:12

user369117


1 Answers

C# stores strings in UCS-2, an older version of the UTF-16 standard. So when you read a UTF-8 string in C#, C# converts it to UCS-2. It's the UCS-2 variant that you transmit to SQL Server.

You can change the xml declaration to encoding="UTF-16" or omit it altogether. There are some differences between UCS-2 and UTF-16; I'd be interesting in knowing how that affects C# and SQL Server!

like image 102
Andomar Avatar answered Dec 31 '25 11:12

Andomar