Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word VBA program reads Emoji character (4 bytes) as "12"

Tags:

ms-word

vba

emoji

I created a document with just one "Thumb up" Emoji (Unicode code point U+1F44D) that I inserted via the standard Windows+; shortcut:

Windows + ; shortcut for Emojis

But I can't get its actual code point with VBA.

I get these values (debug):

text = 12
length = 2
arrBytes = { 49, 0, 50, 0 }

with the following Sub procedure:

Sub test()
    Dim text As String
    Dim length As Integer
    Dim arrBytes() As Byte
    text = ActiveDocument.Range.Characters(1).text
    length = Len(ActiveDocument.Range.Characters(1).text)
    arrBytes = ActiveDocument.Range.Characters(1).text
End Sub

But if I had inserted the same Emoji via the menu Insert > Symbol > Font "Segoe UI Emoji" > U+1F44D (Thumb up), the same Sub procedure gets the values I expect (in debug; ?? are not "real" characters, they are surrogate code points which individually mean nothing):

text = ??
length = 2
arrBytes = { 61, 216, 77, 220 }

(for information, this code decodes the two characters into &#x1F44D)

How to determine the actual character if the Emoji is inserted using Windows+;? (asking the users to choose the workaround above is not part of my question)

ADDENDUM May 26th: solution by @Florent B. works on all of my 3 computers (ActiveDocument.Content.InsertXML ActiveDocument.Content.XML). Reloading the XML may have impacts on VBA programs, for instance it renumbers the image "Shape IDs", but that's another story.

ADDENDUM May 22nd: for the symbol added with Windows+;, I can find the correct value (4 bytes { 61, 216, 77, 220 }) only in the XML property of the document Range object, but it requires that I parse the whole XML and determine which XML characters correspond to which positions of the Range objects, unfortunately I feel it requires a lot of knowledge or assumptions. Here is the part of the XML where I can see the 4 bytes (<w:t>??</w:t> where ?? correspond to the 4 bytes) :

  <?xml version="1.0" standalone="yes"?>
  <?mso-application progid="Word.Document"?>
  <w:wordDocument ...>
    ... (around 23.000 characters)
    <w:body>
      <wx:sect>
        <w:p wsp:rsidR="002703DB" wsp:rsidRDefault="003926FB">
          <w:r>
            <w:rPr>
              <w:rFonts w:ascii="Segoe UI Emoji" w:h-ansi="Segoe UI Emoji"/>
              <wx:font wx:val="Segoe UI Emoji"/>
            </w:rPr>
            <w:t>??</w:t>
          </w:r>
        </w:p>
        <w:sectPr wsp:rsidR="002703DB" wsp:rsidSect="002849CD"><w:pgSz w:w="11906"
                w:h="16838"/><w:pgMar w:top="1417" w:right="1417" w:bottom="1417"
                w:left="1417" w:header="708" w:footer="708" w:gutter="0"/><w:cols
                w:space="708"/><w:docGrid w:line-pitch="360"/></w:sectPr>
      </wx:sect>
    </w:body>
  </w:wordDocument>

The XML is almost the same when I insert the Emoji as a symbol, there are 2 more "rFonts":

    <w:body>
      <wx:sect>
        <w:p wsp:rsidR="00CD420D" wsp:rsidRDefault="00CD420D">
          <w:r>
            <w:rPr>
              <w:rFonts w:ascii="Segoe UI Emoji" w:fareast="Segoe UI Emoji"
                    w:h-ansi="Segoe UI Emoji" w:cs="Segoe UI Emoji"/>
              <wx:font wx:val="Segoe UI Emoji"/>
            </w:rPr>
            <w:t>??</w:t>
          </w:r>
        </w:p>
        <w:sectPr wsp:rsidR="00CD420D" wsp:rsidSect="002849CD"><w:pgSz w:w="11906"
                w:h="16838"/><w:pgMar w:top="1417" w:right="1417" w:bottom="1417"
                w:left="1417" w:header="708" w:footer="708" w:gutter="0"/><w:cols
                w:space="708"/><w:docGrid w:line-pitch="360"/></w:sectPr>
      </wx:sect>
    </w:body>
  </w:wordDocument>

PS: computers/softwares where I could reproduce the problem:

  • Computer 1 (lenovo X230):
    • MS Word Office 365 1904 (16.0.11601.20174) 32 bits, Windows 10 Professional 10.0.17763 x64
    • Also after upgrade to Office 365 1907 16.0.11901.20176, MSO (16.0.11901.20070) 32 bits, Windows 10 Professional 1809 17763.652 x64
  • Computer 2:
    • MS Word Office 365 1904 (16.0.11601.20184) 64 bits, Windows 10 Professional 1809 17763.503 x64
  • Computer 3 (Dell):
    • MS Word Office 365 ProPlus 1808 (16.0.10730.20334) 64 bits, Windows 10 Enterprise 10.0.17763 x64
like image 947
Sandra Rossi Avatar asked May 19 '19 09:05

Sandra Rossi


2 Answers

I hope this helps: building upon @SandraRossi's comments above, it seems the input from the emoji panel is not correctly translated to its surrogate code point. If you save a document containing both symbols ( one from the Emoji Panel, and the other from via the menu, as you described) as an XML doc, you notice the difference:

Emoji Input:

<w:r w:rsidR="003814F5">
  <w:rPr>
    <mc:AlternateContent>
      <mc:Choice Requires="w16se">
        <w:rFonts w:ascii="Segoe UI Emoji" w:hAnsi="Segoe UI Emoji"/>
      </mc:Choice>
      <mc:Fallback>
        <w:rFonts w:hint="eastAsia"/>
      </mc:Fallback>
    </mc:AlternateContent>
  </w:rPr>
  <mc:AlternateContent>
    <mc:Choice Requires="w16se">
      <w16se:symEx w16se:font="@SimHei" w16se:char="1F44D"/>
    </mc:Choice>
    <mc:Fallback>
      <w:t>👍</w:t>
    </mc:Fallback>
  </mc:AlternateContent>
</w:r>

Menu (symbol) input:

<w:r w:rsidR="003814F5">
  <w:rPr>
    <w:rFonts w:ascii="Segoe UI Emoji" w:hAnsi="Segoe UI Emoji"/>
  </w:rPr>
  <w:t xml:space="preserve"> is not 👍</w:t>
</w:r>

The line <w16se:symEx w16se:font="@SimHei" w16se:char="1F44D"/> is the key difference here. The normal (Menu -> Insert Symobol) emoji is used as a fallback.

It seems only Word has the issue. I tried the same emoji panel input on Excel (and PowerPoint), and I get the right values in debug ?? which translates to the Unicode code point U+1F44D both in Excel and when copied back to Word.

like image 149
AAA Avatar answered Nov 11 '22 02:11

AAA


Here are my final believes and findings.

It's probably a bug in MS Word VBA, based on the tests performed by AAA on Excel, Powerpoint and Word. Some people don't have this bug (cf comments).

The VBA objects give an invalid value for the emoji, but the XML property is correct. The XML is too much complex to be parsed easily, so the simplest workaround was found by Florent B. in the comments, which consists in "recreating the document from itself":

ActiveDocument.Content.InsertXML ActiveDocument.Content.XML

Unfortunately, in my personal case, it may have some collateral effects like shape IDs are renumbered.

So, I extended the code above to only correct the emoji characters in the original document, the rest remains intact, by:

  • copying the XML to a new document,
  • then parsing every character whose text length is > 1 in the new document (i.e. those outside the Unicode Basic Multilingual Plane, containing the Emojis and many other characters too),
  • also parsing the original document (assuming the characters should be in the same order as in the new document and their text lengths are the same),
  • copying those characters from the new document back to the original document,
  • closing the new document.

Okay, the macros runs longer, but I couldn't find a better solution.

Here is my code, simplified (you may be surprised by the useless collection of Range objects, where each Range is one Character object, in fact I don't provide the original code for the function Split_Into_Ranges, which is much bigger but faster, but it works and demonstrates well the solution in the sub correct_emojis):

Sub test()

    Dim text As String
    Dim length As Integer
    Dim arrBytes() As Byte

    Dim zranges As Collection
    Set zranges = Split_Into_Ranges(ActiveDocument)

    Call correct_emojis(zranges) ' <=== here the important algorithm

    text = ActiveDocument.Range.Characters(1).text
    length = Len(ActiveDocument.Range.Characters(1).text)
    arrBytes = ActiveDocument.Range.Characters(1).text

End Sub

Function Split_Into_Ranges(ioDocument As Document) As Collection

    Dim zranges As Collection
    Set zranges = New Collection
    For i = 1 To ioDocument.Characters.Count
        zranges.Add ioDocument.Characters(i)
    Next
    Set Split_Into_Ranges = zranges

End Function

Sub correct_emojis(zranges As Collection)

    Dim current_emoji_zranges As Collection
    Dim temp_zranges As Collection
    Dim temp_emoji_zranges As Collection
    Dim doc_current As Document
    Dim doc_temp As Document
    Dim arrBytes() As Byte

    Set doc_current_zranges = get_emoji_zranges(zranges)
    If doc_current_zranges.Count = 0 Then
        Exit Sub
    End If

    Set doc_current = ActiveDocument
    Set doc_temp = Documents.Add()
    Call doc_temp.Content.InsertXML(doc_current.Content.XML)
    Set temp_zranges = Split_Into_Ranges(doc_temp)

    Set current_emoji_zranges = get_emoji_zranges(zranges)
    Set temp_emoji_zranges = get_emoji_zranges(temp_zranges)

    For i = 1 To current_emoji_zranges.Count
        If 0 = 1 Then
            arrBytes = current_emoji_zranges(i).Characters(1).text
            arrBytes = temp_emoji_zranges(i).Characters(1).text
        End If
        current_emoji_zranges(i).Characters(1).text = temp_emoji_zranges(i).Characters(1).text
    Next

    Call doc_temp.Close(False)

End Sub

Function get_emoji_zranges(zranges As Collection) As Collection

    Dim emoji_zranges As Collection

    Set emoji_zranges = New Collection
    For i = 1 To zranges.Count
        If Len(zranges(i).text) > zranges(i).Characters.Count Then
            For j = 1 To zranges(i).Characters.Count
                If Len(zranges(i).Characters(j).text) > 1 Then
                    emoji_zranges.Add (zranges(i))
                End If
            Next
        End If
    Next

    Set get_emoji_zranges = emoji_zranges

End Function
like image 21
2 revs Avatar answered Nov 11 '22 04:11

2 revs