How to convert surrogate pair to Unicode scalar in Swift

Question

The following example is taken from the Strings and Characters documentation:

enter image description here

The values 55357 (U+D83D in hex) and 56374 (U+DC36 in hex) are the surrogate pairs that form the Unicode scalar U+1F436, which is the DOG FACE character. Is there any way to go the other direction? That is, can I convert a surrogate pair into a scalar?

I tried

let myChar: Character = "\u{D83D}\u{DC36}"

but I got an "Invalid Unicode scalar" error.

This Objective C answer and this project seem to be custom solutions, but is there anything built into Swift (especially Swift 2.0+) that does this?

Mathias Bynens · Accepted Answer

There are formulas to calculate the original code point based on a surrogate pair and vice versa. From https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae:

Section 3.7 of The Unicode Standard 3.0 defines the algorithms for converting to and from surrogate pairs.

A code point C greater than 0xFFFF corresponds to a surrogate pair <H, L> as per the following formula:
H = Math.floor((C - 0x10000) / 0x400) + 0xD800
L = (C - 0x10000) % 0x400 + 0xDC00
The reverse mapping, i.e. from a surrogate pair <H, L> to a Unicode code point C, is given by:
C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000

Airspeed Velocity · Answer

Given an sequence of UTF-16 code units (i.e. 16-bit numbers, such as you get from String.utf16 or just an array of numbers), you can use the UTF16 type and its decode method to turn it into UnicodeScalars, which you can then convert into a String.

It’s a bit of a grungy item, that takes a generator (as it does stateful processing) and returns an enum that indicates a result (with an associated type of the scalar), or an error or completion. Swift 2.0 pattern matching makes it a lot easier to use:

let u16data: [UInt16] = [0xD83D,0xDC36]
//or let u16data = "Hello, 🌍".utf16

var g = u16data.generate()
var s: String = ""
var utf16 = UTF16()
while case let .Result(scalar) = utf16.decode(&g) {
    print(scalar, &s)
}
print(s) // prints 🐶

How to convert surrogate pair to Unicode scalar in Swift

Tags:

ios

swift

unicode

scalar

surrogate-pairs

Suragch

2 Answers

Mathias Bynens

Airspeed Velocity

Recent Activity

Donate For Us

How to convert surrogate pair to Unicode scalar in Swift

Tags:

ios

swift

unicode

scalar

surrogate-pairs

Suragch

2 Answers

Mathias Bynens

Airspeed Velocity

Related questions

Recent Activity

Donate For Us