SourceCharacter in ECMAScript 5.1

Question

I'm reading the ECMAScript Specification 5th edition but there is a point that it is not pretty clear, in my opinion.

In Section 6 - Source Text the specification declares a source character as follow:

SourceCharacter::
    any Unicode code unit

and after it says:

Throughout the rest of this document, the phrase "code unit" and the word "character" will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of text. The phrase "Unicode character" will be used to refer to the abstract linguistic or typographical unit represented by a single Unicode scalar value (which may be longer than 16 bits and thus may be represented by more than one code unit).

I think that this sentence it's a bit ambiguous because someone (as me initially) can think that the only allowed characters are those between 0-65535 of the Unicode table.

So, is the sentence ambiguous or only characters between 0-65536 can be used?

Benjamin Gruenbaum · Accepted Answer

It is intentional, they're telling you that any code unit is allowed and then saying that for clarify after the definition of source character they typically mean code unit rather than character.

Note that in UTF16 a code unit is different from a code point.

Everything is a code unit which is 16 bit, but code points can be composed from several code units.

For example "💩" is a single UTF16 code point but it is two code units.

"💩".charCodeAt(0) // 55357
"💩".charCodeAt(1) // 56589

SourceCharacter in ECMAScript 5.1

Tags:

javascript

unicode

zer0uno

1 Answers

Benjamin Gruenbaum

Recent Activity

Donate For Us

SourceCharacter in ECMAScript 5.1

Tags:

javascript

unicode

zer0uno

1 Answers

Benjamin Gruenbaum

Related questions

Recent Activity

Donate For Us