Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to check string encoding?

In my application I import some text into database from files that users upload on site. Database SQL Server 2005, text is stored nvarchar column, I use EF and L2SQL.

Users should make their files with UTF-8 but unfortunately some of them apparently used different encoding. In result some characters are invalid.

I'd like to find which records are valid. I use utf8checker. It works fine with original files, but when text is from database IsUtf8 method always returns true.

like image 652
jlp Avatar asked Dec 20 '25 07:12

jlp


1 Answers

I think SQL server will always store Unicode as UCS-2. So, you need to ensure that the data has the correct encoding at insert time rather than read time. Otherwise SQL server will garble it for you and I don't think there is a way to determine the original encoding after the data has been inserted - unless maybe you have the encoding definition in the record itself, like another column or the first few characters of your data element. Eg. - XML does it this way.

Hope this helps.

like image 121
Nabheet Avatar answered Dec 22 '25 19:12

Nabheet



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!