In Excel, how can I convert the contents of a cell which includes accented characters, curly quotes etc into either HTML for the same characters, OR a transliterated plaintext version?
We have an XLS document which contains some "high" characters. The data has been pulled in via a DB connection, and it appears that Excel is correctly handling individual cells (or rows) being in different codepages.
When we export this data to a CSV, some high characters are not correctly rendered - it appears that Excel uses a single encoding for the document (of course), and the bit value of the characters from their original codepage (which may or may not be consistent with other values in the same document).
As Excel renders the text correctly before export, I believe we should be able to encode the high characters to their HTML equivalents at this point, then export to CSV, thus ensuring that the CSV is ASCII-only.
(Alternatively we could transliterate down to plain ASCII, but that seems like a poor approach and probably no easier ...)
There is a function by pgc01 that seems to do the trick here: http://www.mrexcel.com/forum/showpost.php?p=2091183&postcount=7
Hopefully it's ok for me to quote their code:
Function CodeUni(s As String, Optional bHex As Boolean = True)
    If bHex Then
        CodeUni = Right("0000" & Hex(AscW(Left(s, 1))), 4)
    Else
        CodeUni = AscW(Left(s, 1))
    End If
End Function
In case you're not sure how to get that into your Excel workbook, this guide is pretty useful: http://office.microsoft.com/en-us/excel-help/create-custom-functions-in-excel-2007-HA010218996.aspx
To summarise:
To get it as a proper HTML encoded unicode entity, I used:
="&#"&CodeUni(C1, TRUE)&";"
In my test case, I had ﻼ in C1 and in E1 the formula displays as &#FEFC;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With