Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of Unicode latin subscript letters [closed]

Thanks to jmcnamara I found a great way to use Unicode characters in xlsxwriter charts: xlsxwrter: rich text format in chart title

I need a list of all Unicode characters to copy from. I found some:

  • https://unicode-table.com/en/blocks/superscripts-and-subscripts/
  • https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts

Why is there no alphabet for capital subscript letters? Where can I get those?

like image 548
masterofpuppets Avatar asked Oct 18 '25 11:10

masterofpuppets


1 Answers

Unicode is a character set for mapping between characters/glyphs to numbers. It only deals with plain text and is not supposed for formatting text§. You can't make a letter bold, italic or move a letter to above or below the baseline purely with the Unicode code points (see Create Unicode subscripts and superscripts with combining glyphs)

Characters that seem to represent formatting exist mainly because they were used before in older standards. You can find the reason right in the Unicode standard

Q: Why doesn't Unicode have a full set of superscripts and subscripts?

A: The superscripted and subscripted characters encoded in Unicode are either compatibility characters encoded for roundtrip conversion of data from legacy standards, or are actually modifier letters used with particular meanings in technical transcriptional systems such as IPA and UPA. Those characters are not intended for general superscripting or subscripting of arbitrary text strings—for such textual effects, you should use text styles or markup in rich text, instead.

https://www.unicode.org/faq/ligature_digraph.html

Compatibility is also why the superscript digits ²³¹ are very frequently different from the remaining characters ⁰⁴⁵⁶⁷⁸⁹ because many fonts just contain the former set but not the latter. And ¹ lies behind ²³ because ISO 8859-1 did it that way

In fact almost anything that may seem silly in Unicode is because of compatibility with older character sets. You can find lots of examples where there's an unnecessary Unicode codepoint representing a series of characters like these Nj, Dž, Ⅷ, ㎉, ㎓, ﷽. Similarly there are many unreasonable emojis like the “copyright” ©️, “registered trademark” ®️ and “trademark” ™️ symbols. People have used them in some other charsets before so Unicode had to do the same in order to be able to be converted successfully to/from them.


§ More information about rich text in Unicode:

Rich Text. Also known as styled text. The result of adding information to plain text. Examples of information that can be added include font data, color, formatting information, phonetic annotations, interlinear text, and so on. The Unicode Standard does not address the representation of rich text. It is expected that systems and applications will implement proprietary forms of rich text. Some public forms of rich text are available (for example, ODA, HTML, and SGML). When everything except primary content is removed from rich text, only plain text should remain.

https://unicode.org/glossary/#rich_text (emphasis mine)

Q: What is the difference between “rich text” and “plain text”?

A: Rich text is text with all its formatting information: typeface, point size, weight, kerning, and so on. Plain text is the underlying content stream to which formatting is applied.

One key distinction between the two is that rich text breaks the text up into runs and applies uniform formatting to each run. As such, rich text is inherently stateful. Plain text is not stateful. It should be possible to lose the first half of a block of plain text without any impact on rendering.

Unicode, by design, only deals with plain text. It doesn't provide a generalized solution to rich text issues.

https://www.unicode.org/faq/ligature_digraph.html

like image 192
phuclv Avatar answered Oct 20 '25 11:10

phuclv



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!